More Notes on Testing. Large Sample Properties of the Likelihood Ratio Statistic. Let X i be iid with density f(x, θ). We are interested in testing

Ulrich Müller Economics 2110 Fall 2008 More Notes on Testing Large Sample Properties of the Likelihood Ratio Statistic Let X i be iid with density f(x, θ). We are interested in testing H 0 : θ = θ 0 against H 1 : θ 6= θ 0 where θ is k 1 using a likelihood ratio test. To carry out the test, we need to determine the appropriate critical value c. Recall that c is determined by the requirement that P (LR >c H 0 )=α. In order to determine the critical value, we thus need to determine the distribution of LR when the null hypothesis is true. We now develop a large sample approximation to solve this problem. Let ˆθ =argmax θ L(θ) denote the mle, and write the maximized likelihood ratio statistic as LR = L(ˆθ) L(θ 0 ) Define the statistic ξ LR =2ln(LR)=2(L n (ˆθ) L n (θ 0 )) where L n (θ) =lnl(θ). Since ξ LR is a monotonic transformation of LR, thelr test can be implemented by rejecting the null hypothesis when ξ LR is large. To find the approximate distribution of ξ LR under the null hypothesis, write L n (θ 0 )=L n (ˆθ)+(θ 0 ˆθ) 0 L n(ˆθ) + 1(θ 2 0 ˆθ) 0 2 L n ( θ) 0 (θ 0 ˆθ) where θ(ω) is between θ 0 and ˆθ(ω). Since Ln(ˆθ) =0bytheFOCofthemle,wehave ξ LR = (θ 0 ˆθ) 0 2 L n ( θ) 0 (θ 0 ˆθ) = n(ˆθ θ 0 ) 0 1 2 L n ( θ) n(ˆθ n 0 θ0 ) 1

Proceeding as in our derivations of the properties of the maximum likelihood estimator, n(ˆθ θ0 ) N (0, Ī(θ 0 ) 1 ) so that by Slutzky and the CMT, 1 2 L n ( θ) n 0 = 1 S( θ) n 0 = n 1 nx H ξ 0 LR χ 2 k s i ( θ) 0 p Ī(θ0 ) An asymptotically justified level 1 α confidence set based on the LR statistic is hence of the form {θ (ˆθ θ ) 0 ˆV 1 (ˆθ θ ) <c} where ˆV ³ 1 = 1 2 L n( θ) n and c solves P (χ 2 0 k > c) = α. In large samples, where ˆV ' V = Ī(θ 0) 1,thisconfidence set may be recognized as the interior of an ellipse centered at θ = ˆθ. In the one-dimensional case, we obtain a confidence interval [ˆθ c ˆV 1/2, ˆθ + c ˆV 1/2 ] where c is the positive number that solves P (N (0, 1) >c )=α/2. LargeSamplePropertiesoftheWaldStatistic A close cousin of the LR statistic is the Wald statistic ξ W = n(ˆθ θ 0 ) 0 1 2 L n (ˆθ) n(ˆθ n 0 θ0 ) which differs from ξ LR only because the estimated information matrix is evaluated at ˆθ rather than θ. Note that we can compute the Wald statistic without doing any computations under the null hypothesis. Since both ˆθ and θ converge in probability to θ 0 under the null hypothesis, p,h 0 ξ W ξ LR 0 2

ThemotivationoftheWaldstatisticisthatunderthenullhypothesis,thedifference between the estimator ˆθ and θ 0 satisfies n(ˆθ θ0 ) a N (0, Ī(θ 0) 1 ) and 1 2 L n (ˆθ) consistently estimates Ī(θ n 0 0 ). Under the alternative, ˆθ θ 0 is large, and we reject. <<Note that the Wald statistic is not invariant to reparametrizations of the parameters, unlike the LR statistic.>> Power considerations: Let s consider power against alternatives of the form θ 1n = θ 0 + g/ n for some nonzero k 1 vector g.<<strictly speaking, a true value of θ that depends on n makes the data a double array with observations X n,j,whereforany n, X n,1,x n,2,,x n,n are i.i.d. with density f(x, θ 1n ). We haven t talked about that before, but our proofs of LLN and CLTs work equally well for double arrays, so this ends up being only a notational nuisance...>>. Proceeding as above, we now find n(ˆθ θ1n ) N (0, Ī(θ 0 ) 1 ) which implies n(ˆθ θ0 ) N (g, Ī(θ 0 ) 1 ) so that ξ W converges to a quadratic form in multivariate normals with a nonzero mean. The factor n 1/2 reflects that in larger samples, there is more information about the parameter θ, sothatwehavetoletalternativesshrinktoθ 0 to obtain nontrivial asymptotic results. The appropriate rate for this to happen is sometimes called Pitman drift. LargeSamplePropertiesoftheLMStatistic Another approximation to ξ LR is given by the Lagrange Multiplier test statistic µ ξ LM = n 1/2 S n (θ 0 ) 0 1 2 1 L n (θ 0 ) n 0 n 1/2 S n (θ 0 ) 0 nx µ = n 1/2 s i (θ 0 ) 1 2 1 L n (θ 0 ) nx n 0 n 1/2 s i (θ 0 ) with the advantage that we do not need to compute ˆθ in order to compute ξ LM. Since under the null hypothesis n 1/2 nx s i (θ 0 ) N (0, Ī(θ 0 )) 3

and 1 2 L n (θ 0 ) p n 0 Ī(θ0 ) H we also find ξ 0 LM χ 2 p,h 0 k. (In fact, ξ LM ξ LR 0, as you should check for yourself). Nuisance Parameters in LR, Wald and LM Tests In many applications, a null hypothesis specifies values for some of the parameters, but leaves others unrestricted. How does this affect the LR, Wald and LM testing procedures? Let θ =(β 0,γ 0 ) 0,whereβ is p 1 and θ is k 1, p<k. Suppose we are interested in testing H 0 : β = β 0 against H 1 : β 6= β 0 with γ unrestricted. Let ˆθ =(ˆβ 0, ˆγ 0 ) 0 be the mle, and let ˆθ r =(β 0 0, ˆγ 0 r) 0 be the estimator that maximizes L n (θ) under the restriction that β = β 0. The Wald tests is entirely straightforward to implement, based on the test statistic ξ W = n(ˆβ β 0 ) 0 1 ˆV ββ n(ˆβ β0 ) ³ with ˆV ββ the upper left p p block of 1 2 L n 1. (ˆθ) n Under the null hypothesis, 0 n(ˆβ β0 ) N (0,V ββ ) with V ββ the p p upper left block of Ī(θ 0 ) 1 p,and ˆV ββ Vββ, so that ζ W χ 2 p under the null hypothesis. We now consider the properties of the generalized likelihood ratio statistic We have ξ LR = 2ln(LR gen ) = 2(sup θ L n (θ) sup L n (θ)) γ = 2(L n (ˆθ) L n (ˆθ r )). L n (ˆθ r )=L n (ˆθ)+(ˆθ r ˆθ) 0 L n(ˆθ) + 1(ˆθ 2 r ˆθ) 0 2 L n ( θ) 0 (ˆθ r ˆθ) so that with n 1 2 L n ( θ) 0 p Ī(θ0 ) and L n(ˆθ) =0 ξ LR = n(ˆθ ˆθ r ) 0 Ī(θ 0 ) n(ˆθ ˆθ r )+o p (1). 4

Also, from familiar arguments n 1/2 S n (ˆθ) =0=n 1/2 S n (ˆθ r ) Ī(θ 0 ) n(ˆθ ˆθ r )+o p (1) so that we only need to figure out what the properties of n 1/2 S n (ˆθ r ) are. Let subscripts β and γ denote the appropriate subvectors and submatrices corresponding to β and γ in the score and the information matrix. By another Taylor expansion, n 1/2 S n (ˆθ r )=n 1/2 S n (θ 0 ) Ī(θ 0) n(ˆθ r θ 0 )+o p (1) Note that by the FOC of ˆθ r,thelastk p elements of S n (ˆθ r ) are zero (S γn (ˆθ r )=0), so that we only need to compute S βn (ˆθ r ). Note also that the first p elements of (ˆθ r θ 0 ) are zero. Define the k k matrix Then W = I p Ī βγ (θ 0 )Ī γγ (θ 0 ) 1 0 0 n 1/2 WS n (ˆθ r ) = n 1/2 S n (ˆθ r ) = n 1/2 WS n (θ 0 ) W Ī(θ 0)n 1/2 (ˆθ r θ 0 )+o p (1) = n 1/2 WS n (θ 0 )+o p (1) since the upper right block of W Ī(θ 0) is zero. Note that n 1/2 WS n (θ 0 ) N (0,WĪ(θ 0 )W 0 Īββ (θ 0 ) Ī βγ (θ 0 )Ī γγ (θ 0 ) 1 Ī γβ (θ 0 ) 0 ) N (0, ) 0 0 We conclude that ξ LR = n(ˆθ ˆθ r ) 0 Ī(θ 0 )Ī(θ 0 ) 1 Ī(θ 0 ) n(ˆθ ˆθ r )+o p (1) = n 1/2 S n (θ 0 ) 0 W 0 Ī(θ 0 ) 1 Wn 1/2 S n (θ 0 )+o p (1) χ 2 p under the null hypothesis, since the upper left p p block of Ī(θ 0 ) 1 (Īββ(θ 0 ) Īβγ(θ 0 )Īγγ(θ 0 ) 1 Ī γβ (θ 0 )) 1. is equal to 5