Lecture 21: Properties and robustness of LSE

Lecture 21: Properties and robustness of LSE BLUE: Robustness of LSE against normality We now study properties of l τ β and σ 2 under assumption A2, i.e., without the normality assumption on ε. From Theorem 3.6 and the proof of Theorem 3.7(ii), l τ β (with an l R(Z )) and σ 2 are still unbiased without normality. In what sense are l τ β and σ 2 optimal beyond being unbiased? Some discussion about σ 2 can be found in Rao (1973, p. 228). In general, Var(l τ β) = l τ (Z τ Z ) Z τ Var(ε)Z (Z τ Z ) l. If l R(Z ) and Var(ε) = σ 2 I n (assumption A2), then by the property of generalized inverse matrix, Var(l τ β) = σ 2 l τ (Z τ Z ) l which attains the Cramér-Rao lower bound under assumption A1. We have the following result for the LSE l τ β. UW-Madison (Statistics) Stat 709 Lecture 21 2018 1 / 17

Theorem 3.9 Consider linear model with assumption A2. X = Z β + ε. (1) (i) A necessary and sufficient condition for the existence of a linear unbiased estimator of l τ β (i.e., an unbiased estimator that is linear in X) is l R(Z ). (ii) (Gauss-Markov theorem). If l R(Z ), then the LSE l τ β is the best linear unbiased estimator (BLUE) of l τ β in the sense that it has the minimum variance in the class of linear unbiased estimators of l τ β. Proof of (i) The sufficiency has been established in Theorem 3.6. Suppose now a linear function of X, c τ X with c R n, is unbiased for l τ β. Then l τ β = E(c τ X) = c τ EX = c τ Z β. Since this equality holds for all β, l = Z τ c, i.e., l R(Z ). UW-Madison (Statistics) Stat 709 Lecture 21 2018 2 / 17

Proof of (ii) Let l R(Z ) = R(Z τ Z ). Then l = (Z τ Z )ζ for some ζ and l τ β = ζ τ (Z τ Z ) β = ζ τ Z τ X. Let c τ X be any linear unbiased estimator of l τ β. From the proof of (i), Z τ c = l, and Hence Cov(ζ τ Z τ X,c τ X ζ τ Z τ X) = E(X τ Z ζ c τ X) E(X τ Z ζ ζ τ Z τ X) = σ 2 tr(z ζ c τ ) + β τ Z τ Z ζ c τ Z β σ 2 tr(z ζ ζ τ Z τ ) β τ Z τ Z ζ ζ τ Z τ Z β = σ 2 ζ τ l + (l τ β) 2 σ 2 ζ τ l (l τ β) 2 = 0. Var(c τ X) = Var(c τ X ζ τ Z τ X + ζ τ Z τ X) = Var(c τ X ζ τ Z τ X) + Var(ζ τ Z τ X) + 2 Cov(ζ τ Z τ X,c τ X ζ τ Z τ X) = Var(c τ X ζ τ Z τ X) + Var(l τ β) Var(l τ β). UW-Madison (Statistics) Stat 709 Lecture 21 2018 3 / 17

Robustness of LSE against violation of Var(ε) = σ 2 I n Consider linear model (5) under assumption A3 (E(ε) = 0 and Var(ε) is an unknown matrix). An interesting question is under what conditions on Var(ε) is the LSE of l τ β with l R(Z ) still the BLUE. If l τ β is still the BLUE, then we say that l τ β, considered as a BLUE, is robust against violation of assumption A2. A procedure having certain properties under an assumption is said to be robust against violation of the assumption iff the procedure still has the same properties when the assumption is (slightly) violated. For example, the LSE of l τ β with l R(Z ), as an unbiased estimator, is robust against violation of assumption A1 or A2, since the LSE is unbiased as long as E(ε) = 0, which can be always assumed. On the other hand, the LSE as a UMVUE may not be robust against violation of assumption A1. UW-Madison (Statistics) Stat 709 Lecture 21 2018 4 / 17

Theorem 3.10 Consider model (5) with assumption A3. The following are equivalent. (a) l τ β is the BLUE of l τ β for any l R(Z ). (b) E(l τ βη τ X) = 0 for any l R(Z ) and any η such that E(η τ X) = 0. (c) Z τ Var(ε)U = 0, where U is a matrix such that Z τ U = 0 and R(U τ ) + R(Z τ ) = R n. (d) Var(ε) = Z Λ 1 Z τ + UΛ 2 U τ for some Λ 1 and Λ 2. (e) The matrix Z (Z τ Z ) Z τ Var(ε) is symmetric. Proof We first show that (a) and (b) are equivalent, which is an analogue of Theorem 3.2(i). Suppose that (b) holds. Let l R(Z ). If c τ X is unbiased for l τ β, then E(η τ X) = 0 with η = c Z (Z τ Z ) l. Hence, (b) implies (a) because UW-Madison (Statistics) Stat 709 Lecture 21 2018 5 / 17

Proof (continued) Var(c τ X) = Var(c τ X l τ β + l τ β) = Var(c τ X l τ (Z τ Z ) Z τ X + l τ β) = Var(η τ X + l τ β) = Var(η τ X) + Var(l τ β) + 2 Cov(η τ X,l τ β) = Var(η τ X) + Var(l τ β) + 2E(l τ βη τ X) = Var(η τ X) + Var(l τ β) Var(l τ β). Suppose now that there are l R(Z ) and η such that E(η τ X) = 0 but δ = E(l τ βη τ X) 0. Let c t = tη + Z (Z τ Z ) l. From the previous proof, Var(c τ t X) = t2 Var(η τ X) + Var(l τ β) + 2δt. As long as δ 0, there exists a t such that Var(ct τ X) < Var(lτ β). UW-Madison (Statistics) Stat 709 Lecture 21 2018 6 / 17

Proof (continued) This shows that l τ β cannot be a BLUE and, therefore, (a) implies (b). Next, we show that (b) implies (c). Suppose that (b) holds. Since l R(Z ), l = Z τ γ for some γ. Let η R(U τ ). Then E(η τ X) = η τ Z β = 0 and, hence, 0 = E(l τ βη τ X) = E[γ τ Z (Z τ Z ) Z τ XX τ η] = γ τ Z (Z τ Z ) Z τ Var(ε)η. Since this equality holds for all l R(Z ), it holds for all γ. Thus, Z (Z τ Z ) Z τ Var(ε)U = 0, which implies since Z τ Z (Z τ Z ) Z τ = Z τ. Thus, (c) holds. Z τ Z (Z τ Z ) Z τ Var(ε)U = Z τ Var(ε)U = 0, UW-Madison (Statistics) Stat 709 Lecture 21 2018 7 / 17

Proof (continued) Next, we show that (c) implies (d). We need to use the following facts from the theory of linear algebra: there exists a nonsingular matrix C such that Var(ε) = CC τ and C = ZC 1 + UC 2 for some matrices C j (since R(U τ ) + R(Z τ ) = R n ). Let Λ 1 = C 1 C τ 1, Λ 2 = C 2 C τ 2, and Λ 3 = C 1 C τ 2. Then Var(ε) = Z Λ 1 Z τ + UΛ 2 U τ + Z Λ 3 U τ + UΛ τ 3 Z τ (2) and Z τ Var(ε)U = Z τ Z Λ 3 U τ U, which is 0 if (c) holds. Hence, (c) implies 0 = Z (Z τ Z ) Z τ Z Λ 3 U τ U(U τ U) U τ = Z Λ 3 U τ, which with (2) implies (d). We now show that (d) implies (e). If (d) holds, then Z (Z τ Z ) Z τ Var(ε) = Z Λ 1 Z τ, which is symmetric. Hence (d) implies (e). To complete the proof, we need to show that (e) implies (b), which is left as an exercise. UW-Madison (Statistics) Stat 709 Lecture 21 2018 8 / 17

As a corollary of this theorem, the following result shows when the UMVUE s in model (5) with assumption A1 are robust against the violation of Var(ε) = σ 2 I n. Corollary 3.3 Consider model (5) with a full rank Z, ε = N n (0,Σ), and an unknown positive definite matrix Σ. Then l τ β is a UMVUE of l τ β for any l R p iff one of (b)-(e) in Theorem 3.10 holds. Example 3.16 Consider model (5) with β replaced by a random vector β that is independent of ε. Such a model is called a linear model with random coefficients. Suppose that Var(ε) = σ 2 I n and E( β) = β. Then X = Z β + Z ( β β) + ε = Z β + e, (3) where e = Z ( β β) + ε satisfies E(e) = 0 and UW-Madison (Statistics) Stat 709 Lecture 21 2018 9 / 17

Since Var(e) = Z Var( β)z τ + σ 2 I n. Z (Z τ Z ) Z τ Var(e) = Z Var( β)z τ + σ 2 Z (Z τ Z ) Z τ is symmetric, by Theorem 3.10, the LSE l τ β under model (3) is the BLUE for any l τ β, l R(Z ). If Z is of full rank and ε is normal, then, by Corollary 3.3, l τ β is the UMVUE of l τ β for any l R p. Example 3.17 (Random effects models) Suppose that X ij = µ + A i + e ij, j = 1,...,n i,i = 1,...,m, (4) where µ R is an unknown parameter, A i s are i.i.d. random variables having mean 0 and variance σa 2, e ij s are i.i.d. random errors with mean 0 and variance σ 2, and A i s and e ij s are independent. Model (4) is called a one-way random effects model and A i s are unobserved random effects. UW-Madison (Statistics) Stat 709 Lecture 21 2018 10 / 17

Model (4) is a special case of model (5) with ε ij = A i + e ij and Var(ε) = σ 2 a Σ + σ 2 I n, where Σ is a block diagonal matrix whose ith block is J ni Jn τ i and J k is the k-vector of ones. Under this model, Z = J n, n = m i=1 n i, and Z (Z τ Z ) Z τ = n 1 J n Jn. τ Note that n 1 J n1 Jn τ 1 n 2 J n1 Jn τ 2 n m J n1 J τ n m J n JnΣ τ = n 1 J n2 Jn τ 1 n 2 J n2 Jn τ 2 n m J n2 Jn τ m, n 1 J nm Jn τ 1 n 2 J nm Jn τ 2 n m J nm Jn τ m which is symmetric if and only if n 1 = n 2 = = n m. Since J n Jn τ Var(ε) is symmetric if and only if J n JnΣ τ is symmetric, a necessary and sufficient condition for the LSE of µ to be the BLUE is that all n i s are the same. This condition is also necessary and sufficient for the LSE of µ to be the UMVUE when ε ij s are normal. UW-Madison (Statistics) Stat 709 Lecture 21 2018 11 / 17

In some cases, we are interested in some (not all) linear functions of β. For example, consider l τ β with l R(H), where H is an n p matrix such that R(H) R(Z ). Proposition 3.4 Consider model (5) with assumption A3. Suppose that H is a matrix such that R(H) R(Z ). A necessary and sufficient condition for the LSE l τ β to be the BLUE of l τ β for any l R(H) is H(Z τ Z ) Z τ Var(ε)U = 0, where U is the same as that in (c) of Theorem 3.10. Example 3.18 Consider model (5) with assumption A3 and Z = (H 1 H 2 ), where H τ 1 H 2 = 0. Suppose that under the reduced model X = H 1 β 1 + ε, l τ β 1 is the BLUE for any l τ β 1, l R(H 1 ) UW-Madison (Statistics) Stat 709 Lecture 21 2018 12 / 17

Example 3.18 (continued) and that under the reduced model X = H 2 β 2 + ε, l τ β 2 is not a BLUE for some l τ β 2, l R(H 2 ), where β = (β 1,β 2 ) and β j s are LSE s under the reduced models. Let H = (H 1 0) be n p. Note that H(Z τ Z ) Z τ Var(ε)U = H 1 (H τ 1 H 1) H τ 1 Var(ε)U, which is 0 by Theorem 3.10 for the U given in (c) of Theorem 3.10, and Z (Z τ Z ) Z τ Var(ε)U = H 2 (H τ 2 H 2) H τ 2 Var(ε)U, which is not 0 by Theorem 3.10. This implies that some LSE l τ β is not a BLUE of l τ β but l τ β is the BLUE of l τ β if l R(H). UW-Madison (Statistics) Stat 709 Lecture 21 2018 13 / 17

Asymptotic properties of LSE Without normality on ε or Var(ε) = σ 2 I n, properties of the LSE may be derived under the asymptotic framework. Theorem 3.11 (Consistency) Consider model X = Z β + ε (5) under assumption A3 (E(ε) = 0 and Var(ε) is an unknown matrix). Consider the LSE l τ β with l R(Z ) for every n. Suppose that sup n λ + [ Var(ε)] <, where λ + [A] is the largest eigenvalue of the matrix A, and that lim n λ + [(Z τ Z ) ] = 0. Then l τ β is consistent in mse for any l R(Z ). Proof The result follows from the fact that l τ β is unbiased and Var(l τ β) = l τ (Z τ Z ) Z τ Var(ε)Z (Z τ Z ) l λ + [ Var(ε)]l τ (Z τ Z ) l. UW-Madison (Statistics) Stat 709 Lecture 21 2018 14 / 17

Without the normality assumption on ε, the exact distribution of l τ β is very hard to obtain. The asymptotic distribution of l τ β is derived in the following result. Theorem 3.12 Consider model (5) with assumption A3. Suppose that 0 < inf n λ [ Var(ε)], where λ [A] is the smallest eigenvalue of the matrix A, and that lim max Z n i τ (Z τ Z ) Z i = 0. (6) 1 i n Suppose further that n = k j=1 m j for some integers k, m j, j = 1,...,k, with m j s bounded by a fixed integer m, ε = (ξ 1,...,ξ k ), ξ j R m j, and ξ j s are independent. (i) If sup i E ε i 2+δ <, then for / any l R(Z ), l τ ( β β) Var(l τ β) d N(0,1). (7) (ii) Result (7) holds for any l R(Z ) if, when m i = m j, 1 i < j k, ξ i and ξ j have the same distribution. UW-Madison (Statistics) Stat 709 Lecture 21 2018 15 / 17

Proof For l R(Z ), and l τ (Z τ Z ) Z τ Z β l τ β = 0 l τ ( β β) = l τ (Z τ Z ) Z τ ε = k j=1 c τ nj ξ j, where c nj is the m j -vector whose components are l τ (Z τ Z ) Z i, i = k j 1 + 1,...,k j, k 0 = 0, and k j = j t=1 m t, j = 1,...,k. Note that Also, k j=1 c nj 2 = l τ (Z τ Z ) Z τ Z (Z τ Z ) l = l τ (Z τ Z ) l. (8) max c nj 2 m max 1 j k 1 i n [lτ (Z τ Z ) Z i ] 2 ml τ (Z τ Z ) l max Z i τ (Z τ Z ) Z i, 1 i n which, together with (8) and condition (6), implies that UW-Madison (Statistics) Stat 709 Lecture 21 2018 16 / 17

Proof (continued) lim n ( max c nj 2 1 j k / k The results then follow from Corollary 1.3. Remarks ) c nj 2 = 0. j=1 Under the conditions of Theorem 3.12, Var(ε) is a diagonal block matrix with Var(ξ j ) as the jth diagonal block, which includes the case of independent ε i s as a special case. Exercise 80 shows that condition (6) is almost a necessary condition for the consistency of the LSE. Lemma 3.3 The following are sufficient conditions for (6). (a) λ + [(Z τ Z ) ] 0 and Z τ n (Z τ Z ) Z n 0, as n. (b) There is an increasing sequence {a n } such that a n, a n /a n+1 1, and Z τ Z /a n converges to a positive definite matrix. UW-Madison (Statistics) Stat 709 Lecture 21 2018 17 / 17