by Arndt Regorz, MSc.
September 1, 2024

Misspecifications in a confirmatory factor analysis (CFA) can lead to seriously biased results for the factor correlation. One possible cause for misspecification is the existence of one or more missing cross loadings (i.e., items that load on two or more factors at the same time). The same problems applies to an SEM, where missing cross loadings could lead to biased estimates for the structural part of the model.

In general, we would like not to specify a model with cross loadings. Because we would like our items to load only onto one latent variable (factor).

However, in reality that is not always the case. We can come across a set of factors where one manifest variable / item has a relation to more than one latent factor. In that situation, not estimating the resulting cross loading can lead to seriously biased results.

If you don't model a cross loading that is actually there in your data, then the relationship between the item and the factor it is not allowed to load on will show itself somewhere else in the model. In many cases this relationship will manifest itself as an increased factor correlation (in a CFA) on as an increased path coefficent (in a structural equation model). Thus, it will seem that two latent constructs are more strongly connected than they really are, leading to false conclusions for your hypotheses.

### 2. Example

The following example (using R/lavaan) shows a possible result of a missed cross loading in a CFA. It is based on parts of a famous data set by Holzinger and Swineford that is part of the R package lavaan (Nevertheless, the problem adressed in this blog post is not exclusive to R or lavaan, but a general problem in a CFA or SEM).

We would like to look at the factorial structure for the following six items that all represent test scores on cognitive subtests:

• x1 Visual perception
• x2 Cubes
• x3 Lozenges

• x8 Speeded counting of dots
• x9 Speeded discrimination straight and curved capitals

Items x1-x3 all represent visual tasks and should belong to a factor "visual". Items x7-x9 all represent speed tasks and should belong to a factor "speed". On that basis, we run a CFA.

library(lavaan)

model1 <- '
f1 =~ x1 + x2 + x3
f2 =~ x7 + x8 + x9 '

model_fit1 <- cfa(model1, data=HolzingerSwineford1939)
summary(model_fit1, fit.measures = T, standardized = T)

Unfortunately, the model fit is not good. The model is significant, the main fit indices are CFI = .879, RMSEA = .128, SRMR = .079. This is not a well-fitting model

If, nevertheless, we tried to interpret the result, then we would get a correlation between the two latent factors of r = .46.

Next, we look at the modification indices in order to identify reasons for misfit.

modindices(model_fit1, sort = T, minimum.value = 10)

One of the two largest modification indices is:

lhs op rhs mi epc sepc.lv sepc.all sepc.nox
f1 =~ x9 35.521 0.659 0.512 0.508 0.508

This modification, an additional cross loading from item x9 on the first factor (visual) makes theoretically sense. Because the content of that item, "Speeded discrimination straight and curved capitals" is about speed, but it is also about a visual task.

model2 <- '
f1 =~ x1 + x2 + x3 + x9
f2 =~ x7 + x8 + x9
'

model_fit2 <- cfa(model2, data=HolzingerSwineford1939)
summary(model_fit2, fit.measures = T, standardized = T)

Now, the model fit is much better, CFI = .976 , RMSEA = .061 SRMR = .044 .

But the interesting result is the correlation between both factors. Now, we get a correlation of r = .30. That is 1/3 less than the correlation we got from the inital model (r = .46.)

From this example we can see that leaving out a cross loading in our model can massively bias our results!

### 3. Conclusion

It is important that you have a well-fitting measurement model before you start interpreting results of the structural part of your CFA (factor correlations) or SEM (path coefficients).

You should always check local fit (via modification indices or residual matrices) in addition to global fit (model test, fit indices) to make sure that you are able to spot local misspecifications, e.g., missing cross loadings.

A possible alternative is using ESEM (exploratory structural equation modeling) where cross loadings between related constructs can be automatically estimated.