Lecture 21: Properties and robustness of LSE

Σχετικά έγγραφα
Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

2 Composition. Invertible Mappings

ST5224: Advanced Statistical Theory II

Other Test Constructions: Likelihood Ratio & Bayes Tests

Statistical Inference I Locally most powerful tests

C.S. 430 Assignment 6, Sample Solutions

Every set of first-order formulas is equivalent to an independent set

Lecture 34: Ridge regression and LASSO

6.3 Forecasting ARMA processes

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

Tridiagonal matrices. Gérard MEURANT. October, 2008

Lecture 34 Bootstrap confidence intervals

Problem Set 3: Solutions

Fractional Colorings and Zykov Products of graphs

Nowhere-zero flows Let be a digraph, Abelian group. A Γ-circulation in is a mapping : such that, where, and : tail in X, head in

EE512: Error Control Coding

Concrete Mathematics Exercises from 30 September 2016

Example Sheet 3 Solutions

Homework 3 Solutions

Lecture 12: Pseudo likelihood approach

= λ 1 1 e. = λ 1 =12. has the properties e 1. e 3,V(Y

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

SCITECH Volume 13, Issue 2 RESEARCH ORGANISATION Published online: March 29, 2018

5. Choice under Uncertainty

Uniform Convergence of Fourier Series Michael Taylor

ORDINAL ARITHMETIC JULIAN J. SCHLÖDER

A Note on Intuitionistic Fuzzy. Equivalence Relation

Reminders: linear functions

Homework 8 Model Solution Section

Congruence Classes of Invertible Matrices of Order 3 over F 2

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

Math 446 Homework 3 Solutions. (1). (i): Reverse triangle inequality for metrics: Let (X, d) be a metric space and let x, y, z X.

k A = [k, k]( )[a 1, a 2 ] = [ka 1,ka 2 ] 4For the division of two intervals of confidence in R +

Solution Series 9. i=1 x i and i=1 x i.

4.6 Autoregressive Moving Average Model ARMA(1,1)

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Lecture 15 - Root System Axiomatics

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

Finite Field Problems: Solutions

6. MAXIMUM LIKELIHOOD ESTIMATION

2. Let H 1 and H 2 be Hilbert spaces and let T : H 1 H 2 be a bounded linear operator. Prove that [T (H 1 )] = N (T ). (6p)

Optimal Parameter in Hermitian and Skew-Hermitian Splitting Method for Certain Two-by-Two Block Matrices

Introduction to the ML Estimation of ARMA processes

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ.

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

F19MC2 Solutions 9 Complex Analysis

Matrices and Determinants

A Two-Sided Laplace Inversion Algorithm with Computable Error Bounds and Its Applications in Financial Engineering

Generating Set of the Complete Semigroups of Binary Relations

Mean-Variance Analysis

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

Fourier Series. MATH 211, Calculus II. J. Robert Buchanan. Spring Department of Mathematics

1. Introduction and Preliminaries.

Section 8.3 Trigonometric Equations

Arithmetical applications of lagrangian interpolation. Tanguy Rivoal. Institut Fourier CNRS and Université de Grenoble 1

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

Section 7.6 Double and Half Angle Formulas

Space-Time Symmetries

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

The challenges of non-stable predicates

MINIMAL CLOSED SETS AND MAXIMAL CLOSED SETS

The ε-pseudospectrum of a Matrix

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

12. Radon-Nikodym Theorem

ANSWERSHEET (TOPIC = DIFFERENTIAL CALCULUS) COLLECTION #2. h 0 h h 0 h h 0 ( ) g k = g 0 + g 1 + g g 2009 =?

Coefficient Inequalities for a New Subclass of K-uniformly Convex Functions

Econ Spring 2004 Instructor: Prof. Kiefer Solution to Problem set # 5. γ (0)

SOME PROPERTIES OF FUZZY REAL NUMBERS

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

Inverse trigonometric functions & General Solution of Trigonometric Equations

The Simply Typed Lambda Calculus

Math221: HW# 1 solutions

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

Areas and Lengths in Polar Coordinates

Areas and Lengths in Polar Coordinates

Lecture 13 - Root Space Decomposition II

Lecture 2. Soundness and completeness of propositional logic

ω ω ω ω ω ω+2 ω ω+2 + ω ω ω ω+2 + ω ω+1 ω ω+2 2 ω ω ω ω ω ω ω ω+1 ω ω2 ω ω2 + ω ω ω2 + ω ω ω ω2 + ω ω+1 ω ω2 + ω ω+1 + ω ω ω ω2 + ω

Survival Analysis: One-Sample Problem /Two-Sample Problem/Regression. Lu Tian and Richard Olshen Stanford University

Exercises to Statistics of Material Fatigue No. 5

Second Order Partial Differential Equations

w o = R 1 p. (1) R = p =. = 1

STAT200C: Hypothesis Testing

5.4 The Poisson Distribution.

Lecture 10 - Representation Theory III: Theory of Weights

A General Note on δ-quasi Monotone and Increasing Sequence

Practice Exam 2. Conceptual Questions. 1. State a Basic identity and then verify it. (a) Identity: Solution: One identity is csc(θ) = 1

New bounds for spherical two-distance sets and equiangular lines

Bounding Nonsplitting Enumeration Degrees

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

Homomorphism in Intuitionistic Fuzzy Automata

Appendix S1 1. ( z) α βc. dβ β δ β

Chapter 3: Ordinal Numbers

Main source: "Discrete-time systems and computer control" by Α. ΣΚΟΔΡΑΣ ΨΗΦΙΑΚΟΣ ΕΛΕΓΧΟΣ ΔΙΑΛΕΞΗ 4 ΔΙΑΦΑΝΕΙΑ 1

Commutative Monoids in Intuitionistic Fuzzy Sets

Theorem 8 Let φ be the most powerful size α test of H

derivation of the Laplacian from rectangular to spherical coordinates

Matrices and vectors. Matrix and vector. a 11 a 12 a 1n a 21 a 22 a 2n A = b 1 b 2. b m. R m n, b = = ( a ij. a m1 a m2 a mn. def

Transcript:

Lecture 21: Properties and robustness of LSE BLUE: Robustness of LSE against normality We now study properties of l τ β and σ 2 under assumption A2, i.e., without the normality assumption on ε. From Theorem 3.6 and the proof of Theorem 3.7(ii), l τ β (with an l R(Z )) and σ 2 are still unbiased without normality. In what sense are l τ β and σ 2 optimal beyond being unbiased? Some discussion about σ 2 can be found in Rao (1973, p. 228). In general, Var(l τ β) = l τ (Z τ Z ) Z τ Var(ε)Z (Z τ Z ) l. If l R(Z ) and Var(ε) = σ 2 I n (assumption A2), then by the property of generalized inverse matrix, Var(l τ β) = σ 2 l τ (Z τ Z ) l which attains the Cramér-Rao lower bound under assumption A1. We have the following result for the LSE l τ β. UW-Madison (Statistics) Stat 709 Lecture 21 2018 1 / 17

Theorem 3.9 Consider linear model with assumption A2. X = Z β + ε. (1) (i) A necessary and sufficient condition for the existence of a linear unbiased estimator of l τ β (i.e., an unbiased estimator that is linear in X) is l R(Z ). (ii) (Gauss-Markov theorem). If l R(Z ), then the LSE l τ β is the best linear unbiased estimator (BLUE) of l τ β in the sense that it has the minimum variance in the class of linear unbiased estimators of l τ β. Proof of (i) The sufficiency has been established in Theorem 3.6. Suppose now a linear function of X, c τ X with c R n, is unbiased for l τ β. Then l τ β = E(c τ X) = c τ EX = c τ Z β. Since this equality holds for all β, l = Z τ c, i.e., l R(Z ). UW-Madison (Statistics) Stat 709 Lecture 21 2018 2 / 17

Proof of (ii) Let l R(Z ) = R(Z τ Z ). Then l = (Z τ Z )ζ for some ζ and l τ β = ζ τ (Z τ Z ) β = ζ τ Z τ X. Let c τ X be any linear unbiased estimator of l τ β. From the proof of (i), Z τ c = l, and Hence Cov(ζ τ Z τ X,c τ X ζ τ Z τ X) = E(X τ Z ζ c τ X) E(X τ Z ζ ζ τ Z τ X) = σ 2 tr(z ζ c τ ) + β τ Z τ Z ζ c τ Z β σ 2 tr(z ζ ζ τ Z τ ) β τ Z τ Z ζ ζ τ Z τ Z β = σ 2 ζ τ l + (l τ β) 2 σ 2 ζ τ l (l τ β) 2 = 0. Var(c τ X) = Var(c τ X ζ τ Z τ X + ζ τ Z τ X) = Var(c τ X ζ τ Z τ X) + Var(ζ τ Z τ X) + 2 Cov(ζ τ Z τ X,c τ X ζ τ Z τ X) = Var(c τ X ζ τ Z τ X) + Var(l τ β) Var(l τ β). UW-Madison (Statistics) Stat 709 Lecture 21 2018 3 / 17

Robustness of LSE against violation of Var(ε) = σ 2 I n Consider linear model (5) under assumption A3 (E(ε) = 0 and Var(ε) is an unknown matrix). An interesting question is under what conditions on Var(ε) is the LSE of l τ β with l R(Z ) still the BLUE. If l τ β is still the BLUE, then we say that l τ β, considered as a BLUE, is robust against violation of assumption A2. A procedure having certain properties under an assumption is said to be robust against violation of the assumption iff the procedure still has the same properties when the assumption is (slightly) violated. For example, the LSE of l τ β with l R(Z ), as an unbiased estimator, is robust against violation of assumption A1 or A2, since the LSE is unbiased as long as E(ε) = 0, which can be always assumed. On the other hand, the LSE as a UMVUE may not be robust against violation of assumption A1. UW-Madison (Statistics) Stat 709 Lecture 21 2018 4 / 17

Theorem 3.10 Consider model (5) with assumption A3. The following are equivalent. (a) l τ β is the BLUE of l τ β for any l R(Z ). (b) E(l τ βη τ X) = 0 for any l R(Z ) and any η such that E(η τ X) = 0. (c) Z τ Var(ε)U = 0, where U is a matrix such that Z τ U = 0 and R(U τ ) + R(Z τ ) = R n. (d) Var(ε) = Z Λ 1 Z τ + UΛ 2 U τ for some Λ 1 and Λ 2. (e) The matrix Z (Z τ Z ) Z τ Var(ε) is symmetric. Proof We first show that (a) and (b) are equivalent, which is an analogue of Theorem 3.2(i). Suppose that (b) holds. Let l R(Z ). If c τ X is unbiased for l τ β, then E(η τ X) = 0 with η = c Z (Z τ Z ) l. Hence, (b) implies (a) because UW-Madison (Statistics) Stat 709 Lecture 21 2018 5 / 17

Proof (continued) Var(c τ X) = Var(c τ X l τ β + l τ β) = Var(c τ X l τ (Z τ Z ) Z τ X + l τ β) = Var(η τ X + l τ β) = Var(η τ X) + Var(l τ β) + 2 Cov(η τ X,l τ β) = Var(η τ X) + Var(l τ β) + 2E(l τ βη τ X) = Var(η τ X) + Var(l τ β) Var(l τ β). Suppose now that there are l R(Z ) and η such that E(η τ X) = 0 but δ = E(l τ βη τ X) 0. Let c t = tη + Z (Z τ Z ) l. From the previous proof, Var(c τ t X) = t2 Var(η τ X) + Var(l τ β) + 2δt. As long as δ 0, there exists a t such that Var(ct τ X) < Var(lτ β). UW-Madison (Statistics) Stat 709 Lecture 21 2018 6 / 17

Proof (continued) This shows that l τ β cannot be a BLUE and, therefore, (a) implies (b). Next, we show that (b) implies (c). Suppose that (b) holds. Since l R(Z ), l = Z τ γ for some γ. Let η R(U τ ). Then E(η τ X) = η τ Z β = 0 and, hence, 0 = E(l τ βη τ X) = E[γ τ Z (Z τ Z ) Z τ XX τ η] = γ τ Z (Z τ Z ) Z τ Var(ε)η. Since this equality holds for all l R(Z ), it holds for all γ. Thus, Z (Z τ Z ) Z τ Var(ε)U = 0, which implies since Z τ Z (Z τ Z ) Z τ = Z τ. Thus, (c) holds. Z τ Z (Z τ Z ) Z τ Var(ε)U = Z τ Var(ε)U = 0, UW-Madison (Statistics) Stat 709 Lecture 21 2018 7 / 17

Proof (continued) Next, we show that (c) implies (d). We need to use the following facts from the theory of linear algebra: there exists a nonsingular matrix C such that Var(ε) = CC τ and C = ZC 1 + UC 2 for some matrices C j (since R(U τ ) + R(Z τ ) = R n ). Let Λ 1 = C 1 C τ 1, Λ 2 = C 2 C τ 2, and Λ 3 = C 1 C τ 2. Then Var(ε) = Z Λ 1 Z τ + UΛ 2 U τ + Z Λ 3 U τ + UΛ τ 3 Z τ (2) and Z τ Var(ε)U = Z τ Z Λ 3 U τ U, which is 0 if (c) holds. Hence, (c) implies 0 = Z (Z τ Z ) Z τ Z Λ 3 U τ U(U τ U) U τ = Z Λ 3 U τ, which with (2) implies (d). We now show that (d) implies (e). If (d) holds, then Z (Z τ Z ) Z τ Var(ε) = Z Λ 1 Z τ, which is symmetric. Hence (d) implies (e). To complete the proof, we need to show that (e) implies (b), which is left as an exercise. UW-Madison (Statistics) Stat 709 Lecture 21 2018 8 / 17

As a corollary of this theorem, the following result shows when the UMVUE s in model (5) with assumption A1 are robust against the violation of Var(ε) = σ 2 I n. Corollary 3.3 Consider model (5) with a full rank Z, ε = N n (0,Σ), and an unknown positive definite matrix Σ. Then l τ β is a UMVUE of l τ β for any l R p iff one of (b)-(e) in Theorem 3.10 holds. Example 3.16 Consider model (5) with β replaced by a random vector β that is independent of ε. Such a model is called a linear model with random coefficients. Suppose that Var(ε) = σ 2 I n and E( β) = β. Then X = Z β + Z ( β β) + ε = Z β + e, (3) where e = Z ( β β) + ε satisfies E(e) = 0 and UW-Madison (Statistics) Stat 709 Lecture 21 2018 9 / 17

Since Var(e) = Z Var( β)z τ + σ 2 I n. Z (Z τ Z ) Z τ Var(e) = Z Var( β)z τ + σ 2 Z (Z τ Z ) Z τ is symmetric, by Theorem 3.10, the LSE l τ β under model (3) is the BLUE for any l τ β, l R(Z ). If Z is of full rank and ε is normal, then, by Corollary 3.3, l τ β is the UMVUE of l τ β for any l R p. Example 3.17 (Random effects models) Suppose that X ij = µ + A i + e ij, j = 1,...,n i,i = 1,...,m, (4) where µ R is an unknown parameter, A i s are i.i.d. random variables having mean 0 and variance σa 2, e ij s are i.i.d. random errors with mean 0 and variance σ 2, and A i s and e ij s are independent. Model (4) is called a one-way random effects model and A i s are unobserved random effects. UW-Madison (Statistics) Stat 709 Lecture 21 2018 10 / 17

Model (4) is a special case of model (5) with ε ij = A i + e ij and Var(ε) = σ 2 a Σ + σ 2 I n, where Σ is a block diagonal matrix whose ith block is J ni Jn τ i and J k is the k-vector of ones. Under this model, Z = J n, n = m i=1 n i, and Z (Z τ Z ) Z τ = n 1 J n Jn. τ Note that n 1 J n1 Jn τ 1 n 2 J n1 Jn τ 2 n m J n1 J τ n m J n JnΣ τ = n 1 J n2 Jn τ 1 n 2 J n2 Jn τ 2 n m J n2 Jn τ m, n 1 J nm Jn τ 1 n 2 J nm Jn τ 2 n m J nm Jn τ m which is symmetric if and only if n 1 = n 2 = = n m. Since J n Jn τ Var(ε) is symmetric if and only if J n JnΣ τ is symmetric, a necessary and sufficient condition for the LSE of µ to be the BLUE is that all n i s are the same. This condition is also necessary and sufficient for the LSE of µ to be the UMVUE when ε ij s are normal. UW-Madison (Statistics) Stat 709 Lecture 21 2018 11 / 17

In some cases, we are interested in some (not all) linear functions of β. For example, consider l τ β with l R(H), where H is an n p matrix such that R(H) R(Z ). Proposition 3.4 Consider model (5) with assumption A3. Suppose that H is a matrix such that R(H) R(Z ). A necessary and sufficient condition for the LSE l τ β to be the BLUE of l τ β for any l R(H) is H(Z τ Z ) Z τ Var(ε)U = 0, where U is the same as that in (c) of Theorem 3.10. Example 3.18 Consider model (5) with assumption A3 and Z = (H 1 H 2 ), where H τ 1 H 2 = 0. Suppose that under the reduced model X = H 1 β 1 + ε, l τ β 1 is the BLUE for any l τ β 1, l R(H 1 ) UW-Madison (Statistics) Stat 709 Lecture 21 2018 12 / 17

Example 3.18 (continued) and that under the reduced model X = H 2 β 2 + ε, l τ β 2 is not a BLUE for some l τ β 2, l R(H 2 ), where β = (β 1,β 2 ) and β j s are LSE s under the reduced models. Let H = (H 1 0) be n p. Note that H(Z τ Z ) Z τ Var(ε)U = H 1 (H τ 1 H 1) H τ 1 Var(ε)U, which is 0 by Theorem 3.10 for the U given in (c) of Theorem 3.10, and Z (Z τ Z ) Z τ Var(ε)U = H 2 (H τ 2 H 2) H τ 2 Var(ε)U, which is not 0 by Theorem 3.10. This implies that some LSE l τ β is not a BLUE of l τ β but l τ β is the BLUE of l τ β if l R(H). UW-Madison (Statistics) Stat 709 Lecture 21 2018 13 / 17

Asymptotic properties of LSE Without normality on ε or Var(ε) = σ 2 I n, properties of the LSE may be derived under the asymptotic framework. Theorem 3.11 (Consistency) Consider model X = Z β + ε (5) under assumption A3 (E(ε) = 0 and Var(ε) is an unknown matrix). Consider the LSE l τ β with l R(Z ) for every n. Suppose that sup n λ + [ Var(ε)] <, where λ + [A] is the largest eigenvalue of the matrix A, and that lim n λ + [(Z τ Z ) ] = 0. Then l τ β is consistent in mse for any l R(Z ). Proof The result follows from the fact that l τ β is unbiased and Var(l τ β) = l τ (Z τ Z ) Z τ Var(ε)Z (Z τ Z ) l λ + [ Var(ε)]l τ (Z τ Z ) l. UW-Madison (Statistics) Stat 709 Lecture 21 2018 14 / 17

Without the normality assumption on ε, the exact distribution of l τ β is very hard to obtain. The asymptotic distribution of l τ β is derived in the following result. Theorem 3.12 Consider model (5) with assumption A3. Suppose that 0 < inf n λ [ Var(ε)], where λ [A] is the smallest eigenvalue of the matrix A, and that lim max Z n i τ (Z τ Z ) Z i = 0. (6) 1 i n Suppose further that n = k j=1 m j for some integers k, m j, j = 1,...,k, with m j s bounded by a fixed integer m, ε = (ξ 1,...,ξ k ), ξ j R m j, and ξ j s are independent. (i) If sup i E ε i 2+δ <, then for / any l R(Z ), l τ ( β β) Var(l τ β) d N(0,1). (7) (ii) Result (7) holds for any l R(Z ) if, when m i = m j, 1 i < j k, ξ i and ξ j have the same distribution. UW-Madison (Statistics) Stat 709 Lecture 21 2018 15 / 17

Proof For l R(Z ), and l τ (Z τ Z ) Z τ Z β l τ β = 0 l τ ( β β) = l τ (Z τ Z ) Z τ ε = k j=1 c τ nj ξ j, where c nj is the m j -vector whose components are l τ (Z τ Z ) Z i, i = k j 1 + 1,...,k j, k 0 = 0, and k j = j t=1 m t, j = 1,...,k. Note that Also, k j=1 c nj 2 = l τ (Z τ Z ) Z τ Z (Z τ Z ) l = l τ (Z τ Z ) l. (8) max c nj 2 m max 1 j k 1 i n [lτ (Z τ Z ) Z i ] 2 ml τ (Z τ Z ) l max Z i τ (Z τ Z ) Z i, 1 i n which, together with (8) and condition (6), implies that UW-Madison (Statistics) Stat 709 Lecture 21 2018 16 / 17

Proof (continued) lim n ( max c nj 2 1 j k / k The results then follow from Corollary 1.3. Remarks ) c nj 2 = 0. j=1 Under the conditions of Theorem 3.12, Var(ε) is a diagonal block matrix with Var(ξ j ) as the jth diagonal block, which includes the case of independent ε i s as a special case. Exercise 80 shows that condition (6) is almost a necessary condition for the consistency of the LSE. Lemma 3.3 The following are sufficient conditions for (6). (a) λ + [(Z τ Z ) ] 0 and Z τ n (Z τ Z ) Z n 0, as n. (b) There is an increasing sequence {a n } such that a n, a n /a n+1 1, and Z τ Z /a n converges to a positive definite matrix. UW-Madison (Statistics) Stat 709 Lecture 21 2018 17 / 17