Introduction to Structural Equations with Latent Variables

Σχετικά έγγραφα
Confirmatory Factor Analysis

Μηχανική Μάθηση Hypothesis Testing

The Simply Typed Lambda Calculus

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

6. MAXIMUM LIKELIHOOD ESTIMATION

Queensland University of Technology Transport Data Analysis and Modeling Methodologies

2 Composition. Invertible Mappings

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

Section 8.3 Trigonometric Equations

5.4 The Poisson Distribution.

ST5224: Advanced Statistical Theory II

Other Test Constructions: Likelihood Ratio & Bayes Tests

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

Inverse trigonometric functions & General Solution of Trigonometric Equations

4.6 Autoregressive Moving Average Model ARMA(1,1)

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

Concrete Mathematics Exercises from 30 September 2016

Homework 3 Solutions

Jesse Maassen and Mark Lundstrom Purdue University November 25, 2013

6.3 Forecasting ARMA processes

Figure A.2: MPC and MPCP Age Profiles (estimating ρ, ρ = 2, φ = 0.03)..

Matrices and Determinants

C.S. 430 Assignment 6, Sample Solutions

Exercises to Statistics of Material Fatigue No. 5

derivation of the Laplacian from rectangular to spherical coordinates

Every set of first-order formulas is equivalent to an independent set

EE512: Error Control Coding

Partial Differential Equations in Biology The boundary element method. March 26, 2013

Second Order Partial Differential Equations

Statistical Inference I Locally most powerful tests

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

Instruction Execution Times

Finite Field Problems: Solutions

Math 6 SL Probability Distributions Practice Test Mark Scheme

Section 9.2 Polar Equations and Graphs

Introduction to the ML Estimation of ARMA processes

Srednicki Chapter 55

forms This gives Remark 1. How to remember the above formulas: Substituting these into the equation we obtain with

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review

PARTIAL NOTES for 6.1 Trigonometric Identities

Econ Spring 2004 Instructor: Prof. Kiefer Solution to Problem set # 5. γ (0)

Solutions to Exercise Sheet 5

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

An Inventory of Continuous Distributions

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

2. THEORY OF EQUATIONS. PREVIOUS EAMCET Bits.

Example Sheet 3 Solutions

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

FORMULAS FOR STATISTICS 1

Solution Series 9. i=1 x i and i=1 x i.

Lecture 21: Properties and robustness of LSE

Lecture 2. Soundness and completeness of propositional logic

Repeated measures Επαναληπτικές μετρήσεις

Lecture 34 Bootstrap confidence intervals

Homework 8 Model Solution Section

ΕΙΣΑΓΩΓΗ ΣΤΗ ΣΤΑΤΙΣΤΙΚΗ ΑΝΑΛΥΣΗ

Congruence Classes of Invertible Matrices of Order 3 over F 2

ω ω ω ω ω ω+2 ω ω+2 + ω ω ω ω+2 + ω ω+1 ω ω+2 2 ω ω ω ω ω ω ω ω+1 ω ω2 ω ω2 + ω ω ω2 + ω ω ω ω2 + ω ω+1 ω ω2 + ω ω+1 + ω ω ω ω2 + ω

Συστήματα Διαχείρισης Βάσεων Δεδομένων

Reminders: linear functions

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

Estimation for ARMA Processes with Stable Noise. Matt Calder & Richard A. Davis Colorado State University

F-TF Sum and Difference angle

An Introduction to Signal Detection and Estimation - Second Edition Chapter II: Selected Solutions

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

( y) Partial Differential Equations

Bounding Nonsplitting Enumeration Degrees

5. Choice under Uncertainty

Tridiagonal matrices. Gérard MEURANT. October, 2008

DESIGN OF MACHINERY SOLUTION MANUAL h in h 4 0.

Εργαστήριο Ανάπτυξης Εφαρμογών Βάσεων Δεδομένων. Εξάμηνο 7 ο

Fourier Series. MATH 211, Calculus II. J. Robert Buchanan. Spring Department of Mathematics

[1] P Q. Fig. 3.1

Appendix to On the stability of a compressible axisymmetric rotating flow in a pipe. By Z. Rusak & J. H. Lee

Lecture 12: Pseudo likelihood approach

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

= λ 1 1 e. = λ 1 =12. has the properties e 1. e 3,V(Y

Mean-Variance Analysis

Lecture 26: Circular domains

Approximation of distance between locations on earth given by latitude and longitude

Exercises 10. Find a fundamental matrix of the given system of equations. Also find the fundamental matrix Φ(t) satisfying Φ(0) = I. 1.

Differential equations

ΕΛΛΗΝΙΚΗ ΔΗΜΟΚΡΑΤΙΑ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΡΗΤΗΣ. Ψηφιακή Οικονομία. Διάλεξη 7η: Consumer Behavior Mαρίνα Μπιτσάκη Τμήμα Επιστήμης Υπολογιστών

Fractional Colorings and Zykov Products of graphs

Forced Pendulum Numerical approach

Chapter 1 Introduction to Observational Studies Part 2 Cross-Sectional Selection Bias Adjustment

ANSWERSHEET (TOPIC = DIFFERENTIAL CALCULUS) COLLECTION #2. h 0 h h 0 h h 0 ( ) g k = g 0 + g 1 + g g 2009 =?

D Alembert s Solution to the Wave Equation

Example of the Baum-Welch Algorithm

CE 530 Molecular Simulation

( ) 2 and compare to M.

ΠΤΥΧΙΑΚΗ ΕΡΓΑΣΙΑ "ΠΟΛΥΚΡΙΤΗΡΙΑ ΣΥΣΤΗΜΑΤΑ ΛΗΨΗΣ ΑΠΟΦΑΣΕΩΝ. Η ΠΕΡΙΠΤΩΣΗ ΤΗΣ ΕΠΙΛΟΓΗΣ ΑΣΦΑΛΙΣΤΗΡΙΟΥ ΣΥΜΒΟΛΑΙΟΥ ΥΓΕΙΑΣ "

PENGARUHKEPEMIMPINANINSTRUKSIONAL KEPALASEKOLAHDAN MOTIVASI BERPRESTASI GURU TERHADAP KINERJA MENGAJAR GURU SD NEGERI DI KOTA SUKABUMI

Supplementary Appendix

Notes on the Open Economy

ΑΓΓΛΙΚΑ Ι. Ενότητα 7α: Impact of the Internet on Economic Education. Ζωή Κανταρίδου Τμήμα Εφαρμοσμένης Πληροφορικής

Quadratic Expressions

Transcript:

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this site. Copyright 2007, The Johns Hopkins University and Qian-Li Xue. All rights reserved. Use of these materials permitted only in accordance with license rights granted. Materials provided AS IS ; no representations or warranties provided. User assumes all responsibility for use, and all liability related thereto, and must independently review all materials for accuracy and efficacy. May contain materials owned by others. User is responsible for obtaining permissions for use from third parties as needed.

Introduction to Structural Equations with Latent Variables Statistics for Psychosocial Research II: Structural Models Qian-Li Xue

No Ordinary Regression Test of causal hypotheses? Yes SEM (Origin: Path Models) Yes Continuous endogenous var. and Continuous LV? Yes No Categorical indicators and Categorical LV? No Classic SEM Latent Class Reg. Latent Trait Yes Longitudinal Data? No Latent Profile Adv. SEM I: latent growth curves) Yes Multilevel Data? No Adv. SEM II: Multilevel Models Classic SEM

Adding Latent Variables to the Model So far we ve only included observed variables in our path models Etension to latent variables: need to add in a measurement piece how do we define the latent variable (think factor analysis) more complicated to look at, but same general principles apply

Path Diagram ζ ξ γ η γ 2 β 2 η 2 ζ 2

Notation for Latent Variable Model η = latent endogenous variable (eta) ξ = latent eogenous variable (ksi, pronounced kah-see ) ζ = latent error (zeta) β = coefficient on latent endogenous variable (beta) γ = coefficient on latent eogenous variable (gamma) η = γ ξ + ζ η = β η + γ 2 2 2 ξ + ζ e.g. ξ is childhood home environment, η is social support, and η 2 is cancer coping ability η = Βη + Γξ + ζ, Cov( ξ ) = Φ, Cov( ζ ) 2 = Ψ

Path Diagram δ ε δ 2 2 δ 3 3 2 3 ζ ξ γ η γ 2 β 2 4 5 6 y y 2 ε 2 y 3 ε 3 η 2 7 9 8 y 4 y 5 ε 4 ε 5 ζ 2 y 6 ε 6

Notation for Measurement Model y = observed indicator of η = observed indicator of ξ ε = measurement error on y δ = measurement error on = coefficient relating to ξ y = coefficient relating y to η = ξ + δ = ξ + δ 2 2 2 = ξ + δ 3 3 3 y y y = η + ε 4 = η + ε 2 5 2 = η + ε 3 6 3 y y y = η + ε 4 7 2 4 = η + ε 5 8 2 5 = η + ε 6 9 2 6 y = Λ y η + ε, Cov( ε ) = Θ ε = Λ ξ + δ, Cov( δ ) = Θ δ

Eample: Model Specification ζ ζ 2 β 2 η η 2 γ γ 2 η 2 is democracy in 965 η is democracy in 960 ξ is industrialization in 960 ξ η η 2 = 0 β 2 0 η 0 η 2 + γ γ 2 ζ ζ 2 [ ] ξ + η = Βη + Γξ + ζ, Cov( ξ ) = Φ, Cov( ζ ) = Ψ

Eample: Model Specification ε ε 2 ε 3 ε 4 ε 5 ε 6 ε 7 ε 8 y y 2 y 3 y 4 y 5 y 6 y 7 y 8 ζ 3 4 2 3 4 2 Y,Y 5 : freedom of press β 2 η Y 2,Y 6 : freedom of group opposition η 2 Y 3,Y 7 : fairness of election Y 4,Y 8 : effectiveness of legislative γ γ 2 body ζ is GNP per capita ξ 2 2 is energy consumption per capita 6 7 3 is % labor force 2 3 δ δ 2 δ 3 Bollen pp322-323

Model Estimation In multiple regression, estimation is based on individual cases via LS, i.e. minimization of n ( Yˆ Y ) In SEM, estimation is based on covariances If Model is correct and θ is known Where Σ is the population covariance matri of observed variables and Σ(θ) is the covariance matri written as a function of θ 2 Σ = Σ(θ )

Model Estimation Regression analysis, confirmatory factor analysis are special cases E.g. y=γ+ζ, ζ γ, E(ζ)=0 VAR y) COV (, y) ( 2 VAR( ) γ VAR( ) + VAR( ζ ) = γvar( ) VAR( ) COV (, y) = γvar( ) γ = Does γ look familiar? Remember in SLR, β=( ) - y COV (, y) VAR( )

Model Estimation In reality, neither population covariances Σ nor the parameters θ are known What we have is sample estimate of Σ: S Goal: estimate θ based on S by choosing the estimates of θ such that Σˆ is as close to S as possible But how close is close? Define and minimize objective functions or called fitting functions : F(S, Σˆ ) e.g. F(S, Σˆ )=S- Σˆ

Common Fitting Functions Maimum Likelihood (ML) To minimize F ML =(/2)tr({[S-Σ(θ)] Σ - (θ)} 2 ) or F ML =log Σ(θ) +tr(sσ - (θ))-log S -(p+q) Eplicit solutions of θ may not eist Iterative numeric procedure is needed Asymptotic properties of ML estimators: Consistent, i.e. θˆ θas n Efficient, i.e. smallest asymptotic variance Asymptotic normality

Common Fitting Functions Maimum Likelihood (ML) Advantages Scale invariant F(S,Σ(θ)) if scale invariance if F(S,Σ(θ)) = F(DSD,DΣ(θ)D), where D is a diagonal matri with positive elements E.g. D consists of inverses of standard deviation of observed variables, DSD becomes a correlation matri More general, the value of F is the same for any change of scale (e.g. dollars to cents) Scale free Knowing D, we can calculate θˆ * (based on transformed data) from θˆ (based on non-transformed data) without actually rerunning the model Test of overall model fit for overidentified model based on the fact: (N-)F ML is a χ 2 distribution with ½(p+q)(p+q+)-t Disadvantage Assumption of multivariate normality

Common Fitting Functions Unweighted Least Squares (ULS) To minimize F ULS =(/2)tr[(S-Σ(θ)) 2 ] Analogous to OLS, minimize the sum of squares of each element in the residual matri (S-Σ(θ)) Give greater weights to off covariance terms than variance terms Eplicit solutions of θ may not eist Iterative numeric procedure is needed Advantages of ULS estimators: Intuitive Consistent, i.e. θˆ θas n No distributional assumptions Disadvantages Not most efficient Not scale invariant, not scale free

Common Fitting Functions Generalized Least Squares (GLS) To minimize F GLS =(/2)tr({[S-Σ(θ)]S - } 2 ) Weights the elements of (S- Σ(θ)) according to variances and covariances F ULS is a special case of F GLS with S - =I Advantages of ULS estimators: Intuitive Consistent, i.e. θˆ θas n Asymptotic normality (availability of significance test) Asymptotically efficient Scale invariant and scale free Test of overall model fit for overidentified model based on the fact: (N-)F GLS is a χ 2 distribution with ½(p+q)(p+q+)-t Disadvantages Sensitive to fat or thin tails

Eample: Model Specification ε ε 2 ε 3 ε 4 ε 5 ε 6 ε 7 ε 8 y y 2 y 3 y 4 y 5 y 6 y 7 y 8 3 4 2 3 4 2 Y,Y 5 : freedom of press β 2 η Y 2,Y 6 : freedom of group opposition η 2 Y 3,Y 7 : fairness of election Y 4,Y 8 : effectiveness of legislative γ γ 2 body is GNP per capita ξ 2 is energy consumption per capita 6 7 3 is % labor force 2 3 δ δ 2 δ 3 Bollen pp322-323

Simple Case of SEM with Latent Variables: Confirmatory Factor Analysis

Recap of Basic Characteristics of Eploratory Factor Analysis (EFA) Most EFA etract orthogonal factors, which is boring to SEM users Distinction between common and unique variances EFA is underidentified (i.e. no unique solution) Remember rotation? Equally good fit with different rotations! All measures are related to each factor

Confirmatory Factor Analysis (CFA) Takes factor analysis a step further. We can test or confirm or implement a highly constrained a priori structure that meets conditions of model identification But be careful, a model can never be confirmed!! CFA model is constructed in advance number of latent variables ( factors ) is pre-set by analyst (not part of the modeling usually) Whether latent variable influences observed is specified Measurement errors may correlate Difference between CFA and the usual SEM: SEM assumes causally interrelated latent variables CFA assumes interrelated latent variables (i.e. eogenous)

Eploratory Factor Analysis Two factor model: = Λ ξ + δ 2 3 4 5 6 = 2 3 4 5 6 2 22 32 42 52 62 ξ ξ 2 + δ δ 2 δ 3 δ 4 δ 5 δ 6 ξ ξ 2 2 3 4 5 6 δ δ 2 δ 3 δ 4 δ 5 δ 6

CFA Notation + = 6 5 4 3 2 2 62 52 42 3 2 6 5 4 3 2 0 0 0 0 0 0 δ δ δ δ δ δ ξ ξ Two factor model: = + Λ ξ δ δ δ 2 δ 3 δ 4 δ 5 δ 6 2 3 4 5 6 ξ 2 ξ

For the matri-challenged CFA = ξ + δ = ξ + δ 2 2 2 = ξ + δ 3 3 3 = ξ + δ 4 42 2 4 = ξ + δ 5 52 2 5 = ξ + δ 6 62 2 6 cov( ξ, ξ ) = ϕ 2 2 = ξ + ξ + δ 2 2 = ξ + ξ + δ 2 2 22 2 2 = ξ + ξ + δ 3 3 32 2 3 = ξ + ξ + δ 4 4 42 2 4 = ξ + ξ + δ 5 5 52 2 5 = ξ + ξ + δ 6 6 62 2 6 ξ ξ2 = 0 cov(, ) EFA

More Important Notation Φ (capital of φ): covariance matri of factors Φ = = ϕ ϕ2 var( ξ) ϕ ϕ 2 22 var( ξ ) = ϕ cov( ξ, ξ ) = ϕ 2 2 Ψ(capital of ): covariance matri of errors 2 3 Ψ = var( Δ) = 4 5 6 22 32 42 52 62 33 43 53 63 44 54 64 55 65 66 var( δ ) cov( δ, δ ) = 2 = 2 NOTE: usually, ij = 0 if i j

Model Estimation In our previous CFA eample, we have si equations, and many more unknowns In this form, not enough information to uniquely solve for and the factor correlation What if we multiple both sides of by ' = ( ξ + δ )( ξ + δ ) = ( ξ )( ξ ) + ( ξ ) δ + δ ( ξ ) + δδ becauseξ and δ are independent = ( ξ )( ξ ) + δδ = ξξ + δδ = Λ ξ + δ Σ = Φ + Ψ = Σ( Θ) where Φ is the covariance matri of factorsξ, and Ψ is error covariance matri

Model Constraints Hallmark of CFA Purposes for setting constraints: Test a priori theory Ensure identifiability Test reliability of measures

Model Constraints: Identifiability Latent variables (LVs) need some constraints Because factors are unmeasured, their variances can take different values Recall EFA where we constrained factors: F ~ N(0,) Otherwise, model is not identifiable. Here we have two options: Fi variance of latent variables (LV) to be (or another constant) Fi one path between LV and indicator

Necessary Constraints Fi variances: Fi path: ξ 2 δ δ 2 ξ 2 δ δ 2 ξ 2 3 4 δ 3 δ 4 ξ 2 3 4 δ 3 δ 4 5 δ 5 5 δ 5 6 δ 6 6 δ 6

Model Parametrization Fi variances: = ξ + δ = ξ + δ 2 2 2 = ξ + δ 3 3 3 = ξ + δ 4 42 2 4 = ξ + δ 5 52 2 5 = ξ + δ 6 62 2 6 cov( ξ, ξ ) var( ξ) = var( ξ ) = 2 2 2 = ϕ Fi path: = ξ + δ = ξ + δ 2 2 2 = ξ + δ 3 3 3 = ξ + δ 4 2 4 = ξ + δ 5 52 2 5 = ξ + δ 6 62 2 6 cov( ξ, ξ ) var( ξ ) var( ξ ) = 2 2 = = ϕ ϕ 2 22 ϕ

Model Constraints: Reliability Assessment Parallel measures T = T 2 [= E()] (True scores are equal) T affects and 2 equally Cov(δ, δ 2 ) = 0 (Errors not correlated) Var (δ ) = Var (δ 2 ) (Equal error variances) T 2 3 δ δ 2 δ 3 e e e

Model Constraints: Reliability Assessment Tau equivalent measures T = T 2 T affects and 2 equally Var (δ ) Var (δ 2 ) Note: for standardized measures, it makes no sense to constrain the loadings without also constraining the residuals, since Var() =.0 T 2 3 δ δ 2 δ 3 e e2 e3