Finite Mixture Regression (FMR) with R
Moderator Analysis without a Moderator

Arndt Regorz, Dipl. Kfm. & M.Sc. Psychology, 12/07/2023


(Note: When you click on this video you are using a service offered by YouTube.)


If you have performed a simple or multiple regression, you obtain a single regression weight for each predictor in your model for all observations/units of analysis. The regression implicitly assumes that the relationships in your population are homogeneous, meaning the same relationships apply to all units of analysis.

However, this is often an unrealistic assumption, especially because individuals differ from each other, leading to variations in their responses to experimental conditions or environmental variables. It is reasonable to assume that population heterogeneity often exists (i.e., relationships are not the same for all individuals). In such cases, the results of ordinary regression are merely average values for various effects.

A method to handle population heterogeneity that you might be familiar with is moderation analysis. However, this approach only works when observable population heterogeneity is present. In other words, you need to have a specific suspicion about which variable explains the heterogeneity in your data, and you must have already recorded it in your dataset.

In contrast, Finite Mixture Regression (FMR) can uncover unobservable population heterogeneity. If you have obtained significant results in your regression, you can use FMR to exploratively examine whether there are subgroups in your dataset that differ in regression weights (in terms of size, direction, and significance).

This tutorial demonstrates how to implement this easily using R. Before that, I will briefly explain some basics of FMR for a better understanding of R code and analysis.

Principles of FMR

There are some fundamental principles to understand when using Finite Mixture Regression:

Latent Groups: FMR assumes the presence of latent groups (subpopulations) in your data, which are hidden – meaning there is no observable group variable indicating group membership.

Group Membership: Each dataset belongs to one of these groups, but we do not know which group. FMR determines this.

Separate Models: Each group has its own relationship between variables. This means that regression weights differ between groups, and their significance may also differ. For example, a predictor in Group A may have a much stronger influence than in Group B, and a second predictor may have no significant effect in Group A but does in Group B.

Overall Model: The final model for all data considers all these separate models, with the contribution of each group weighted based on the likelihood that a dataset belongs to that group. Therefore, individual observations are not uniquely assigned to a latent group but with a certain probability. For example, a case may belong to Group A with 92%, Group B with 0%, and Group C with 8%.

As a result of Finite Mixture Regression, you can:

Identify Groups: Determine how many and what types of groups exist in your data. What are the probabilities that a case belongs to these groups?

Compare Groups: How do the relationships between variables differ between groups? Which predictors play a role in which groups?

Make Predictions: Once you have your model, you can use it to make predictions for new datasets. These predictions may vary depending on the assignment of the new dataset to a group.

Practical Implementation

The practical implementation involves several steps:

Perform the analysis for various numbers of latent groups because you do not know how many different subpopulations are present in the data.

Determine the optimal number of latent groups, primarily based on information criteria.

Compare regression results between groups.

Extract group membership, i.e., for each participant/observation, the latent group with the highest probability.

Based on this (outside of FMR), exploratory analysis follows to explain the different group memberships, serving as a basis for later moderation analysis. Various statistical methods, both descriptive and inferential (e.g., multinomial logistic regression), are employed to understand why a particular case has been assigned to a specific group.

R-Code for finite mixture regression

Here is the R code for the Youtube tutorial about FMR:

#install.packages("flexmix")
library(flexmix)

data("NregFix", package = "flexmix")
head(NregFix)
?NregFix

# Specific number of components (example: k = 2)
fittedModel_2c <- flexmix(y ~ x2 + x1, k = 2, data = NregFix)

summary(fittedModel_2c)
summary(refit(fittedModel_2c))
parameters(fittedModel_2c)

?stepFlexmix
# Getting the best number of latent groups
fittedModel_1_5_c <- stepFlexmix(y ~ x2 + x1,
k = c(1,2,3,4,5),
nrep = 10,
data = NregFix)

fittedModel_1_5_c
plot(fittedModel_1_5_c)
# => Best model: 3 groups
# based on BIC and on ICL (Integrated Completed Likelihood Criterion)

# Looking at 3 group-solution
best_model <- getModel(fittedModel_1_5_c, which = 3)
summary(best_model)

summary(refit(best_model))
parameters(best_model)

plot(best_model)

# Which observation goes into which group?
posterior(best_model)
clusters(best_model)

head(posterior(best_model), 5)
head(clusters(best_model), 5)

# Add the assigned clusters to the dataframe
data_cluster <- NregFix
data_cluster$cluster <- factor(clusters(best_model))

head(data_cluster, 5)

# Looking at the clusters
library(psych)
describeBy(data_cluster, data_cluster$cluster)

# Regression plot for each group
plot(NregFix$x1, NregFix$y, col = clusters(best_model))