Confirmatory Factor Analysis

Confirmatory Factor Analysis Session 9, Lecture 7 // Adding Latent Variables to the Model So far we ve only included observed variables in our path models Etension to latent variables: need to add in a measurement piece how do we define the latent variable (think factor analysis) more complicated to look at, but same general principles apply

Why we need the measurement piece We can t directly measure most of the interesting things in behavioral science. Beyond that, human behavior plays a significant role in most (if not all!) of the pressing public health issues in the US imagine if we could get every American to Eercise and eat a healthy diet stop smoking, and using drugs or alcohol Use condoms This is why we need to figure out how to fit latent variables into a statistical framework Notation for Latent Variable Model η = latent endogenous variable (eta) y = observed indicator of η (these both makes ee sounds) = latent eogenous variable (pronounced ksi, or zi) = observed indicator of (these both make sounds) ζ = latent error (zeta) (zeta rhymes with eta) β = coefficient on latent endogenous variable (beta) γ = coefficient on latent eogenous variable (gamma) = measurement error on (delta) ε = measurement error on y (epsilon) Λ y,, y = coefficient relating y to η (lambda) Λ, = coefficient relating to

η = γ + ζ η = β η + γ + ζ ε ζ γ η γ β y y ε y ε η 7 9 8 y y ε ε ζ y ε

To write equations for models:. Start an equation for everything that has an arrow pointing at it. Each is the sum of (everything pointing at it multiplied by the weight). η = γ + ζ η = β η + γ + ζ = + = + = + y y y = η + ε = η + ε = η + ε y y y = η + ε 7 = η + ε 8 = η + ε 9 Confirmatory vs. Eploratory Factor Analysis CFA: We can test or confirm or implement via model constraints. Model Construction CFA: drawn in advance, including number of latent variables (factors) and which latent variables influence which indicators. EFA: not in advance, typically all latent variables influence all indicators Measurement Errors on Indicators () CFA: may be correlated EFA: can t be correlated

Eploratory Factor Analysis Two factor model: + = ~ ~ ~ + = Λ No curved arrows between deltas + = Eplanation of Matri Notation + + = + =

Confirmatory Factor Analysis Two factor model: ~ ~ ~ = Λ + = + Don t Like Matrices? CFA EFA = + = + = + = + = + = + cov(, ) = ϕ = + + = + + = + + = + + = + + = + + = cov(, )

Necessary Constraints Latent variables (LVs) need some constraints Recall EFA where we constrained factors: F ~ N(,) Otherwise, model is not identifiable. Here we have two options: Fi variance of latent variables (LV) to be (or another constant) Fi one path between LV and indicator Fi variances: Fi path: 7

Fi variances: = + = + = + = + = + = + cov(, ) = ϕ var( ) = var( ) = Fi path: = = + = + = + = + = + = + cov(, ) = ϕ var( ) = ϕ var( ) = ϕ More Important Notation Φ (capital of ): covariance matri of eogenous latent variables ϕ ϕ var( ) = ϕ Φ = = var( ) ϕ ϕ cov(, ) = ϕ Θ (capital of ): covariance matri of errors Θ var( ) = cov(, ) = NOTE: usually, ij = if i j 8

Identifiability Rules for CFA () T-rule (revisited) necessary, but not sufficient t things to estimate n observed variables t n( n + ) indicator rule Sufficient, but not necessary At least two factors At least two indicators per factor Eactly one non-zero element per row of Λ (translation: each is pointed at by one LV) Non-correlated errors (Θ is diagonal) (translation: no double-header arrows between the s) Factors are correlated (Φ has no zero elements)* (translation: there are double-headed arrows between all of the eogenous latent variables ()) * Alternative less strict criteria: each factor is correlated with at least one other factor. (see Bollen, p7) 9

= + Θ = var( ) = Φ = var( ) = No zeros in phi =All latent vars are correlated with each other. What if we take out that arrow?

= + Θ = var( ) = Φ = var( ) = 7 7 8 8

+ = 8 7 8 7 8 7 = Θ Φ = 7 8 7 8

+ = 8 7 8 7 8 7 = Θ Φ = ϕ indicator rule sufficient, not necessary at least one factor at least three indicators per factor one non-zero element per row of Λ (translation: each is pointed at by only one LV) non-correlated errors (Θ is diagonal) (translation: no double-headed arrows between the s) NO restrictions on Φ (translation: factors don t have to be correlated)

~ ~ ~ = Λ + = + Θ = var( ) = Φ = = ϕ var( ) ϕ What about the T-rule? Sample Moments: (8*9)/= Parameters being estimated: 8 eogenous variances= (fied) error variances = 8 direct effects = 8 double-headed arrows =

Food for Thought If something fulfills the -indicator rule, will it automatically fulfill the -indicator rule? If not, draw a counter-eample If something fulfills the -indicator rule, will it automatically fulfill the -indicator rule? If not, draw a counter eample If something fulfills either the - or - indicator rule, will it automatically fulfull the t-rule? If not, draw a counter-eample Estimation Procedures (Bollen, Ch. and Ch.7) Maimum Likelihood Generalized Least Squares Where these are in AMOS: View/Set Analysis Properties Estimation

CMIN (a global test of fit) Global Tests ~ Goodness of Fit AMOS: CMIN Get value comparing the covariance predicted by the model to the observed covariance (v. ugly formula found in AMOS help appendi B) The values can be used to compare nested models (e.g., your models compared to the saturated model ) More on CMIN CMIN : Depends on estimation procedure Asymptotically χ Maimum likelihood: -Log-likelihood Generalized LS: like Pearson χ CMIN for ML and from LS will converge as sample size approaches infinity. Assumes perfect fit : small deviations from model can lead to rejections (especially as N gets large) 7

Other fit statistics: Information Criteria Akaike: AIC = -LL + *s Schwarz: BIC = -LL + s*log(n*p) consistent AIC: CAIC = -LL + s*(log(n) + ) s is # of free parameters (parameters being estimated) p is number of parameters in independence model (no arrows, no latent vars, just observed variables) the smaller the better Other fit statistics: RMR (root mean square residual) average squared amount that sample variances and covariances differ from estimates is perfect fit the smaller the better, but no preset cutoff GFI (goodness of fit inde) is perfect fit ranges from to According to Garson, you want to see a GFI>.9 For more, check out http://www.chass.ncsu.edu/garson/pa7/structur.htm 8

Comparing Models AMOS: Model Fit Manage Models Can constrain parameters Careful with errors--can get negative variance estimate warnings. Nested likelihood ratios Comparison of info criteria (AIC, BIC) Parameter Evaluation Estimate/SE = Z-statistic Standard interpretation: if Z >, then significant Consider both statistical and scientific value of including a variable in the model Notes for parameter testing in CFA: Not usually interesting in determining if loadings are equal to zero Might be interested in testing whether or not covariance between factors is zero. 9

CFA Eample: Epidemiologic Catchment Area (ECA) Data Hypotheses: Lethargy and depression are two aspects of mental distress. Gender and age influence depression Current job influences both depression and lethargy Factors: depression lethargy X s: CFA Eample: ECA data have you been managing to keep yourself busy and occupied? have you been getting out of the house as much as usual? have you felt on the whole that you were doing things well? have you felt constantly under strain? have you found everything getting too much for you? have you been feeling unhappy and depressed?

Is this model identified? - indicator rule -How many sample moments to I have? -How many parameters am I estimating?

Everything seems fine, right? Or not.. Is this an identifiability issue?

CMIN= -LL (for ML) Proof: AIC = -LL + *s -LL=.98-(*) =.98

Second-Order Factor Analysis Factors: depression lethargy X s: have you been managing to keep yourself busy and occupied? have you been getting out of the house as much as usual? have you felt on the whole that you were doing things well? have you felt constantly under strain? have you found everything getting too much for you? have you been feeling unhappy and depressed? Maybe you look back at this, and think, maybe those two factors are correlated, because there is a grand factor to which they are both related. You would want STRONG prior theory Like mine

In order to get this to run, I had to keep constraining more and more parameters to be equal to. And still, even if it ran, bad things kept happening Totally crazy 9 trillion! (variance would be 77 octrillions) But, something even MORE important: The names of the two mediating latent vars can be (sort of) inferred from its indicators. BUT, what about the name for the eogenous latent variable It s hard to figure out what things really are when you have latent variables (with no indicators of their own) pointing to other latent variables.