6 MAXIMUM LIKELIHOOD ESIMAION [1] Maximum Likelihood Estimator (1) Cases in which θ (unknown parameter) is scalar Notational Clarification: From now on, we denote the true value of θ as θ o hen, view θ as a variable Definition: (Likelihood function) Let {x 1,, x } be a sample from a population It does not have to be a random sample x t is a scalar Let f(x 1,x,, x,θ o ) be the joint density function of x 1,, x he functional form of f is known, but not θ o hen, L (θ) f(x 1,, x, θ) is called likelihood function L (θ) is a function of θ given x 1,, x he functional form of f is known, but not θ o Definition: (log-likelihood function) l (θ) = ln[f(x 1,, x,θ)] MLE-1
Example: {x 1,, x }: a random sample from a population distributed with f(x,θ o ) f(x 1,, x, θ o ) = t = 1 f (, θ ) x t o L (θ) = f(x 1,, x, θ) = t = t = 1 l (θ) = ( f ( xt, )) 1 f ( x, θ ) ln θ = Σ ln f (, θ ) t t x t Definition: (Maximum Likelihood Estimator (MLE)) MLE θˆ MLE maximizes l (θ) given data points x 1,, x Example: {x 1,, x } is a random sample from a population following a Poisson distribution [ie, f(x,θ) = e -θ θ x /x! (suppressing subscript o from θ)] Note that E(x) = var(x) = θ o for Poisson distribution l (θ) = Σ t ln[f(x t,θ)] = -θ + (ln(θ))σ t x t - Σ t x t! 1 FOC of max: / θ = + Σtxt = 0 θ Solving this, θˆ MLE = Σ t x t = x MLE-
() Extension to the Cases with Multiple Parameters Definition: θ = [θ 1,θ,, θ p ] L (θ) = f(x 1,, x,θ) = f(x 1,, x, θ 1,, θ p ) l (θ) = ln[f(x 1,, x,θ) = ln[f(x 1,, x, θ 1,, θ p )] x t could be a vector If {x 1,, x } is a random sample from a population with f(x,θ o ), t = 1 l (θ) = ( f ( xt, )) ln θ = Σ ln f (, θ ) t x t Definition: (MLE) MLE θˆ MLE maximizes l (θ) given data (vector) points x 1,, x hat is, θˆ MLE solves ( θ ) θ = ( θ ) / θ1 0 ( θ ) / θ 0 = : : ( θ ) / θ 0 p p 1 Example: Let {x 1,, x } be a random sample from N(µ,σ ) [suppressing subscript o ] Since {x 1,, x } is a random sample, E(x t ) = µ o and var(x t ) = σ o Let θ = (µ,v), where v = σ MLE-3
1 ( xt µ ) f( xt, θ ) = exp v π v 1/ 1/ ( xt µ ) = ( π ) ( v) exp v 1 1 ( xt µ ) ln[ f( xt, θ)] = ln( π) ln( v) v Σt( xt ) ( θ) = ln( π) ln( v) v MLE solves FOC: ( θ ) 1 Σt( xt µ ) (1) = Σt ( xt )( 1) = = 0; µ v v ( θ ) Σt( xt ) () = + = 0 v v v From (1): Σ x (3) Σ t( x t µ ) = 0 Σ t x t - µ = 0 µˆ t t MLE = = x Substituting (3) in to (): (4) -v + Σ t (x t - µˆ MLE ) = 0 hus, vˆ 1 MLE = Σt( xt x) ˆ ˆ µ x MLE θ = = 1 MLE vˆ Σt( xt x) MLE MLE-4
[] Large Sample Properties of the ML estimator Definition: 1) Let g(θ) = g(θ 1,, θ p ) be a scalar function of θ Let g j = g/ θ j hen, g1 g g = θ : g p ) Let w(θ) =(w 1 (θ),, w m (θ)) be a m 1 vector of functions of θ Let w ij = w i (θ)/ θ j hen, w11 w1 w1 p ( ) w w w w θ = θ : : : w w w 1 p m1 m mp m p 3) Let g(θ) be a scalar function of θ where g ij = g(θ)/ θ i θ j hen, g11 g1 g1 p g g g = θθ : : : g g g g( θ ) 1 p Called Hessian matrix of g(θ) p1 p pp p p MLE-5
Example 1: Let g(θ) = θ 1 + θ + θ 1 θ Find g(θ)/ θ g( θ ) = θ θ + θ 1 θ θ + 1 Example : Let θ1 + θ w( θ ) = θ1+ θ w( θ ) θ 1 θ 1 = 1 θ Example 3: Let g(θ) = θ 1 + θ + θ 1 θ Find the Hessian matrix of g(θ) g( θ ) 1 = θθ 1 Some useful results: 1) c : 1 p, θ: p 1 (c θ is a scalar) (c θ)/ θ = c ; (c θ)/ θ = c ) R: m p, θ: p 1 (Rθ is m 1) (Rθ)/ θ = R 3) A: p p symmetric, θ: p 1 (θ Aθ) (θ Aθ)/ θ = Aθ (θ'aθ)/ θ = θ'a (θ Aθ)/ θ θ = A MLE-6
Definition: (Hessian matrix of log-likelihood function) H l l ( θ ) = = θθ θi θ j p p heorem: Let ˆ θ be MLE hen, under suitable regularity conditions, ˆ θ is consistent, and, 1 ˆ 1 ( θ θo) d N0 p 1, plim H( θo) Further, ˆ θ is asymptotically efficient Implication: ˆ θ N(θ o, [-H (θ o )] -1 ) ˆ θ N(θ o, [-H ( ˆ θ )] -1 ) Example: {x 1,, x } is a random sample from N(µ o,σ o ) Let θ = [µ,v] and v =σ 1 1 l v x v = ln( π ) ln( ) Σt( t µ ) he first derivatives: l ( θ ) Σ ( x ) l ( θ ) 1 µ v v v v t t = ; = + Σ ( ) t xt MLE-7
he second derivatives: l ( θ ) 1 = Σt( 1) = µµ v v ; l( θ ) Σt( xt ) = ; µ v v l ( θ ) 0 v 1 4v 1 vv v ( v) v v = + Σ ( ) ( ) t xt µ = Σ 3 t xt herefore, Hence, Σt( xt ) v ν H ( θ ) = Σt( xt ) Σt( xt ) + 3 ν ν v 0 vˆ ( ˆ ML H θml) = 0 vˆ ML vˆ ML 0 ˆ ˆ µ ML µ o θ = N, v ˆ ML v o vˆ ML 0 MLE-8
[3] esting Hypotheses Based on MLE General form of hypotheses: Let w(θ) = [w 1 (θ),w (θ),, w m (θ)], where w j (θ) = w j (θ 1, θ,, θ p ) = a function of θ 1,, θ p H o : he true θ (θ o ) satisfies the m restrcitions, w(θ) = 0 m 1 (m p) Definition: (Restricted MLE) Let θ be the restricted ML estimator which maximizes l (θ) st w(θ) = 0 Wald est: ˆ ˆ ˆ ˆ 1 ˆ W = w( θ)'[ W( θ) Cov( θ) W( θ)] w( θ) If ˆ θ is a (unrestricted) ML estimator, ˆ ˆ ˆ 1 ˆ 1 ˆ W = w( θ)[ W( θ){ H ( θ)} W( θ)] w( θ) Note: Can be computed with any consistent estimator ˆ θ and Cov( ˆ θ ) Likelihood Ratio est: (LR) LR = [l ( ˆ θ ) - l (θ )] Lagrangean Multiplier (LM) test Define s l ( θ ) ( θ ) = hen, LM = s (θ ) [-H (θ )] -1 s (θ ) θ MLE-9
heorem: Under H o : w(θ) = 0, W, LR, LM d χ (m) Implication: Given significance level (α), find a critical value from χ table Usually, α = 005 or α = 001 If W > c, reject H o Otherwise, do not reject H o Comments: 1) Wald needs only ˆ θ ; LR needs both ˆ θ and θ ; and LM needs θ only ) In general, W LR LM 3) W is not invariant to how to write restrictions hat is, W for H o : θ 1 = θ may not be equal to W for H o : θ 1 /θ = 1 Example: (1) {x 1,, x }: RS from N(µ o,v o ) with v o known So, θ = µ H o : µ = 0 w(µ) = µ l (µ) = -(/)ln(π) - (/)ln(v o ) - {1/(v o )}Σ t (x t -µ) s (µ) = (1/v o )Σ t (x t -µ) H ( µ ) = v o MLE-10
[Wald est] Unrestricted MLE: FOC: l (µ)/ µ = (1/v)Σ t (x t -µ) = 0 ˆ µ = x W(µ) = 1 W( ˆµ ) = 1 -H ( ˆµ ) = /v o [LR est] Restricted MLE: µ = 0 l ( ˆµ ) = -(/)ln(π) - (/)ln(v o ) - {1/(v o )}Σ t (x t - x ) l (µ ) = -(/)ln(π) - (/)ln(v o )- {1/(v o )}Σ t x t [LM est] s (µ ) = (1/v o )Σ t x t = (/v o ) x ; I ( µ ) = /v o With this information, can show that W = LR = LM = x v o () Both µ and v unknown: θ = (µ,v) H o : µ = 0 w(θ) = µ W(θ) = w(θ)/ θ = [ µ/ µ, µ/ v] = [1, 0] l (θ) = -(/)ln(π) - (/)ln(v) - {1/(v)}Σ t (x t -µ) MLE-11
s (θ) = 1 Σt( xt ) v 1 + Σ ( ) t xt v v ; Σt( xt ) v ν H ( θ ) = Σt( xt ) Σt( xt ) + 3 ν ν v Unrestricted MLE: ˆ µ = x and 1 vˆ ( x x) = Σt t Restricted MLE: µ = 0, but need to compute v l ( µ,v) = -(/)ln(π) - (/)ln(v) - {1/(v)}Σ t (x t - µ ) l (0,v) = -(/)ln(π) - (/)ln(v) - {1/(v)}Σ t x t FOC: l (0,v)/ v = -/(v) + (1/(v ))/Σ t x t = 0 v = (1/)Σ t x t [Wald est] w( ˆ θ ) = ˆµ = x ; W( ˆ θ ) = ( 1 0 ); -H ( ˆ θ ) = vˆ 0 0 vˆ W = w( ˆ θ ) [W( ˆ θ ){I ( ˆ θ )} -1 W( ˆ θ ) ] -1 w( ˆ θ ) = x v ˆ MLE-1
[LR est] l ( ˆ θ ) = -(/)ln(π) - (/)ln( ˆv ) - {1/( ˆv )}Σ t (x t - x ) l (θ ) = -(/)ln(π) - (/)ln(v ) - {1/(v )}Σ t x t [LM est] s 1 x Σtx x t v v ( θ ) = = = v ; 1 + Σ 0 tx t + v v v v Σtxt v ν H( θml) = Σtxt ν ν LM = 1 x s ( θ)[ I( θ)] s( θ) = v x MLE-13
[4] Efficiency of OLS estimator under Ideal Conditions Assume that y t is iid N(x t β,v) conditional on x t f(y t x t,β,v) = 1 1 exp ( y ) π v v ti xti β l( β, v) =Σtln f( yt β, v, xti) 1 = ln( π) ln v Σ( y x v ) 1 = ln( π ) ln v ( y Xβ) ( y Xβ) v t t ti β herefore, we have the following likelihood function of y FOC: (i) l (β,v)/ β = -(1/v)[-X y + X Xβ] = 0 k 1 (ii) l (β,v)/ v = -(/v) + (1/v )(y-xβ) (y-xβ) = 0 From (i), X y - X Xβ = 0 k 1 From (ii), vˆ MLE = SSE/ βˆ MLE = (X X) -1 X y = βˆ hus, we can conclude that ˆ β and s = SSE/(-k) are asymptotically efficient MLE-14