Lecture 3: Asymptotic Normality of M-estimators

Lecture 3: Asymptotic Istructor: Departmet of Ecoomics Staford Uiversity Prepared by Webo Zhou, Remi Uiversity

Refereces Takeshi Amemiya, 1985, Advaced Ecoometrics, Harvard Uiversity Press Newey ad McFadde, 1994, Chapter 36, Volume 4, The Hadbook of Ecoometrics.

Asymptotic Normality The Geeral Framework Everythig is just some form of first order Taylor Expasio: Q (ˆθ) = 0 Q (θ 0 ) + ) (ˆθ 2 Q (θ ) θ0 = 0. ) ( (ˆθ 2 Q (θ ) 1 ) Q (θ 0 ) θ0 = ( LD 2 ) 1 Q (θ 0 ) Q (θ 0 ) d = N ( 0, A 1 BA 1) where ( 2 ) ( ) Q (θ 0 ) Q (θ 0 ) A = E, B = Var

Asymptotic Normality for MLE I MLE, Q(θ) = 1 Iformatio matrix: log L(θ). 2 Q (θ) = 1 2 log L(θ). E 2 log L (θ 0 ) = E log L (θ 0) log L (θ 0 ). by usig iterchage of itegratio ad differetiatio. So A = B, ad (ˆθ θ0 ) d N ( 0, A 1) = N ( 0, ( lim 1 ) 1 ) E 2 log L (θ). What if iterchagig itegratio ad differetiatio is ot possible? Example: If y (θ, ), the E log f (y;θ) = f (θ).

Asymptotic Normality for GMM Q (θ) = g (θ) Wg (θ), g (θ) = 1 t=1 g (z t, θ). Asymptotic ormality holds whe the momet fuctios oly have first derivatives. Deote G (θ) = g(θ), θ [θ 0, ˆθ], Ĝ G (ˆθ), G G (θ ), G = EG (θ 0 ), Ω = E ( g (z, θ 0 ) g (z, θ 0 ) ). 0 = Ĝ Wg (ˆθ) = Ĝ W ( ) g (θ 0 ) + G (ˆθ θ 0 ) = (ˆθ θ 0 ) = (Ĝ WG ) 1 Ĝ W g (θ 0 ) LD = (G WG) 1 G W g (θ 0 ) LD = (G WG) 1 G W N (0, Ω) = N (0, (G WG) 1 G W ΩWG (G WG) 1)

Examples Efficiet choice of W = Ω 1 (or W Ω 1 ), ) ( d (ˆθ θ 0 N 0, ( G Ω 1 G ) ) 1. Whe G is ivertible, W is irrelevat, ) ( (ˆθ ) ( d θ0 N 0, G 1 ΩG 1 = N 0, ( G Ω 1 G ) ) 1. Whe Ω = αg(or G Ω), ( ) d ˆβ β 0 N ( 0, αg 1).

Least square (LS): g (z, β) = x (y xβ). G = Exx, Ω = Eε 2 xx, the ( ) ( d ˆβ β0 N 0, (Exx ) 1 ( Eε 2 xx ) (Exx ) 1), the so-called White s heteroscedasticity cosistecy stadard error. If E [ ε 2 x ] = σ 2, the Ω = σ 2 G ad ( ˆβ β 0 ) ( d N 0, σ 2 (Exx ) 1). Weighted LS: g (z, β) = 1 E(ε 2 x) (y x β). G = E 1 E(ε 2 x) xx = Ω = ( ˆβ β 0 ) d N (0, G).

Liear 2SLS: g (z, β) = z (y xβ). G = Ezx, Ω = Eε 2 zz, W = (Ezz ) 1, the ( ) d ˆβ β0 N (0, V ). If Eε 2 zz = σ 2 Ezz, V = σ 2 [Exz (Ezz ) 1 Ezx ] 1. Liear 3SLS: g (z, β) = z (y xβ). G = Ezx, Ω = Eε 2 zz, W = ( Eε 2 zz ) 1, the ( ) [ d ˆβ β 0 N (0, V ) for V = Exz ( Eε 2 zz ) 1 1 Ezx ]. MLE as GMM: g (z, θ) = G = E 2 log f (z,θ) = Ω = E ) (ˆθ θ log f (z,θ). log f (z,θ) log f (z,θ), the d N ( 0, G 1) = N (0, Ω).

GMM agai: Take liear combiatios of the momet coditios to make Number of g (z, θ) = Number of θ. I particular, take h (z, θ) = G Wg (z, θ) ad use h (z, θ) as the ew momet coditios, the ˆθ = argmax θ [ 1 ] [ ] 1 h (z t, θ) h (z t, θ) t=1 t=1 is asymptotically equivalet to ˆθ = argmax θ g Wg, where G = E h(z,θ) = G WG, Ω = Eh (z, θ) h (z, θ) = G W ΩWG.

Quatile Regressio as GMM: g (z, β) = (τ 1 (y x β)) x, ad W is irrelevat. G = E g(z,β) β = E 1(y x β)x β. Proceedig with a quick ad dirty way take expectatio before takig differetiatio: G = E1 (y x β) x β =Ex F (y x β x) β = ExF (y x β x) β = Ef y (x β x) xx = Ef u (0 x) xx. Coditioal o x, [ τ 1 (y x β 0 ) = τ ] 1 (u 0) is a Beroulli r.v. E (τ 1 (y x β 0 )) 2 x = τ (1 τ), the Ω = EE [ ] (τ 1 (y x β 0 )) 2 x xx = τ (1 τ) Exx.

Quatile Regressio as GMM: ( ( ˆβ β 0 ) d N 0, τ (1 τ) [Ef u (0 x) xx ] 1 Exx [Ef u (0 x) xx ] 1). f (0 x) = f (0) if homoscedastic, the V = τ(1 τ) f (0) Exx. Cosistet estimatio of G ad Ω: Estimated by G. = 1 t=1 g(z t,ˆθ). For osmooth problems as quatile regressio, use Q (ˆθ+2h )+Q (ˆθ 2h ) 2Q(ˆθ) 4h 2 Require h = o (1) ad 1/h = o ( 1/ ). to approximate. For statioary data, heteroscedasticity ad depedece will oly affect estimatio of Ω. For idepedet data, use White s heteroscedasticity-cosistet estimate; for depedet data, use Newey-West s autocorrelatio-cosistet estimate.

Iteratio ad Oe Step Estimatio The iitial guess θ the ext roud guess θ. Newto-Raphso, use quadratic approximatio for Q (θ). Gauss-Newto, use liear approximatio for the first-order coditio, e.g. GMM. If the iitial guess is a cosistet estimate, more iteratio will ot icrease (first-order) asymptotic efficiecy. ) ( ) e.g. ( θ θ 0 = O 1 p, the ( θ ) LD= θ 0 (ˆθ θ 0 ), for ˆθ = argmax θ Q (θ).

1 Newto-Raphso, Use quadratic approximatio for Q (θ): ) ) ) Q ( θ ( Q (θ) Q ( θ + θ θ ) + 1 ( θ 2 θ ) 2 Q ( θ ( θ θ θ ) = 0. ) ) Q ( θ 2 Q ( θ ) = + ( θ θ = 0. ) 1 ( θ) = θ = θ 2 Q ( θ Q 2 Gauss-Newto, use liear approximatio for the first-order coditio, e.g. GMM: ( ) ( )) ( ) Q (θ) g ( θ + G θ θ W g ( θ + G ) ) = G Wg ( θ + G W G ( θ θ = 0. = θ = θ ( G) 1 ) G W GWg ( θ ( )) θ θ

If the iitial guess is a ( ) ( ) cosistet estimate, e.g. β β 0 = O 1 p, the ( θ ) LD= θ 0 (ˆθ θ 0 ), for ˆθ = argmax θ Q (θ). More iteratio will ot icrease (first-order) asymptotic efficiecy:

1 For Newto-Raphso: ( θ) ) ) ( θ θ 0 = ( θ θ 0 2 Q 1 ) Q ( θ ( θ) = ) ( θ θ 0 2 Q 1 [ Q (θ ) ] 0) + ( θ 2 Q (θ ) θ 0 ( θ) 1 ( θ) 1 = I 2 Q 2 Q (θ ) ) ( θ θ0 2 Q Q (θ 0) = o p (1) + ) (ˆθ θ0 2 For Gauss-Newto: ( θ θ0 ) = ( θ θ0 ) ( G) 1 G W G W [g ( θ )] (θ 0) + G θ0 ( ( ) ) 1 ) ( 1 = I G W G G WG ( θ θ 0 G W G) G W g (θ 0) = o p (1) + ) ( θ θ0

Ifluece Fuctio φ (z t ) is called ifluece fuctio if (ˆθ θ 0 ) = 1 t=1 φ (z t) + o p (1), Eφ (z t ) = 0, Eφ (z t ) φ (z t ) <. Thik of (ˆθ θ 0 ) distributed as φ (z t ) N ( 0, Eφφ ). Used for discussio of asymptotic efficiecy, two step or multistep estimatio, etc.

Examples For MLE, [ φ (z t ) = E 2 l f (y t, θ 0 ) For GMM, [ = E l f (y t, θ 0 ) ] 1 l f (yt, θ 0 ) ] l f (y t, θ 0 ) 1 l f (y t, θ 0 ). or φ = ( G WG ) 1 G Wg (z t, θ 0 ), ( φ = E h ) 1 h (z t, θ 0 ) for h (z t, θ 0 ) = G Wg (z t, θ 0 ). Quatile Regressio: φ (z t ) = [ Ef (0 x) xx ] 1 (τ 1 (u 0)) xt.

Asymptotic Efficiecy Is MLE efficiet amog all asymptotically ormal estimators? Superefficiet estimator: Suppose d (ˆθ θ 0 ) N (0, V ) for all θ. Now defie { ˆθ θ if ˆθ 1/4 = 0 if ˆθ < 1/4 d the (θ θ 0 ) N (0, 0) if θ 0 = 0, ad (θ θ 0 ) LD = d (ˆθ θ 0 ) N (0, V ) if θ 0 0. ˆθ is regular if for ay data geerated by θ = θ 0 + δ/, for δ 0, (ˆθ θ 0 ) has a limit distributio that does ot deped o δ.

For regular estimators, ifluece fuctio represetatio idexed by τ, (ˆθ (τ) θ 0 ) LD = φ (z, τ) N ( 0, Eφ (τ) φ (τ) ), ˆθ ( τ) is efficiet tha ˆθ (τ) if it has a smaller var-cov matrix. A ecessary coditio is that Cov (φ (z, τ) φ (z, τ), φ (z, τ)) = 0 for all τ icludig τ. The followig are equivalet: Cov (φ (z, τ) φ (z, τ), φ (z, τ)) = 0 Cov (φ (z, τ), φ (z, τ)) = Var (φ (z, τ)) Eφ (z, τ) φ (z, τ) = Eφ (z, τ) φ (z, τ)

Newey s efficiecy framework: Classify estimators ito the GMM framework with φ (z, τ) = D (τ) 1 m (z, τ). For the class idexed by τ = W, give a vector g (z, θ 0 ), D (τ) D (W ) = G WG ad m (z, τ) m (z, W ) = G Wg (z, θ 0 ). Cosider MLE amog the class of GMM estimators, so that τ idexes ay vector of momet fuctio havig the same dimesio as θ. I this case, D (τ) D (h) = E h ad m (z, τ) = h (z t, θ 0 ).

For this particular case where φ (z, τ) = D (τ) 1 m (z, τ), Eφ (z, τ) φ (z, τ) = Eφ (z, τ) φ (z, τ) = D (τ) 1 Em (z, τ) m (z, τ) D ( τ) 1 = D ( τ) 1 Em (z, τ) m (z, τ) D ( τ) 1. If τ satisfies D (τ) = Em (z, τ) m (z, τ) for all τ, the both sides above are the same D ( τ) 1 ad so efficiet. Examples. Check D (τ) = Em (z, τ) m (z, τ). GMM with optimal weightig matrix: D (τ) = G WG, m (z, τ) = m (z, W ) = G Wg(z, θ 0 ). To check D (τ) = Em (z, τ) m (z, τ) = G W Ω W G, G WG = G W Ω W G = Ω W = I = W = Ω 1.

MLE better tha ay GMM: D (τ) = E h(z,θ 0), m (z, τ) = h (z, θ 0 ). To check D (τ) = Eh (z, θ 0 ) h (z, θ 0 ), use the geeralized iformatio matrix equality: 0 = Eh (z, θ 0) = h (z, θ) f (z, θ) dz h (z, θ) l f (z, θ) = f (z, θ) dz + h (z, θ) f (z, θ) dz = E h (z, θ 0) + Eh (z, θ 0 ) l f (z, θ 0) = h (z, θ 0 ) = l f (y,θ 0), the score fuctio for MLE.

Two Step Estimator Geeral Framework: First step estimator (ˆγ γ 0 ) = 1 t=1 φ (z t) + o p (1). Estimate ˆθ by Q (ˆθ, ˆγ) = 1 t=1 q(z t, ˆθ, ˆγ) = 0 Let = 1 h(z t, ˆθ, ˆγ). t=1 Let H (z, θ, γ) = h (z, θ, γ), Γ (z, θ, γ) = H = EH (z t, θ 0, γ 0 ), Γ = EΓ (z, θ 0, γ 0 ) ; h = h (θ 0, γ 0 ). h (z, θ, γ) ; γ

1 The just taylor expad: h (z t, ˆθ, ) ˆγ = 0 1 h (θ0, ˆγ) + 1 H (θ, ˆγ) ) (ˆθ θ0 = 0 = ) [ 1 1 (ˆθ θ0 = H (θ 1, ˆγ)] h (θ0, ˆγ) [ LD = H 1 1 h (θ0, γ 0 ) + 1 Γ (θ0, γ ) ] (ˆγ γ 0 ) [ LD = H 1 1 ( 1 )] h + Γ φ (zt ) + o p (1) [ LD = H 1 1 1 ] h + Γ φ (zt ). So that ) d (ˆθ θ0 N (0, V ) for V = H 1 E (h + Γφ) (h + φ Γ ) H 1.

GMM both first stage ˆγ ad secod stage ˆθ: φ = M 1 m (z), for some momet coditio m (z, γ). h (θ, ˆγ) = G Wg (z, θ, ˆγ) so that H = G WG, Γ = G W g γ G WG γ for G γ g γ. Plug these ito the above geeral case. If W = I, ad G is ivertible, the this simplies to V = G 1 [ Ω + (Egφ ) G γ + G γ (Eφg ) + G γ (Eφφ ) G γ] G 1. Agai if you have trouble differetiatig g(θ,γ) or g(θ,γ) γ, the simply take expectatio before differetiatio, just replace H ad Γ by Eg(θ,γ) ad Eg(θ,γ) γ.