Lecture 12: Pseudo likelihood approach

Σχετικά έγγραφα
Other Test Constructions: Likelihood Ratio & Bayes Tests

Statistical Inference I Locally most powerful tests

ST5224: Advanced Statistical Theory II

Lecture 21: Properties and robustness of LSE

4.6 Autoregressive Moving Average Model ARMA(1,1)

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

The Simply Typed Lambda Calculus

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

2 Composition. Invertible Mappings

Solution Series 9. i=1 x i and i=1 x i.

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Introduction to the ML Estimation of ARMA processes

Homework 8 Model Solution Section

6. MAXIMUM LIKELIHOOD ESTIMATION

C.S. 430 Assignment 6, Sample Solutions

Fourier Series. MATH 211, Calculus II. J. Robert Buchanan. Spring Department of Mathematics

6.3 Forecasting ARMA processes

Estimation for ARMA Processes with Stable Noise. Matt Calder & Richard A. Davis Colorado State University

Uniform Convergence of Fourier Series Michael Taylor

Theorem 8 Let φ be the most powerful size α test of H

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

EE512: Error Control Coding

Homework 3 Solutions

Example Sheet 3 Solutions

Approximation of distance between locations on earth given by latitude and longitude

Exercises to Statistics of Material Fatigue No. 5

Areas and Lengths in Polar Coordinates

Partial Differential Equations in Biology The boundary element method. March 26, 2013

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

Every set of first-order formulas is equivalent to an independent set

D Alembert s Solution to the Wave Equation

Section 8.3 Trigonometric Equations

Math 6 SL Probability Distributions Practice Test Mark Scheme

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Μηχανική Μάθηση Hypothesis Testing

5.4 The Poisson Distribution.

derivation of the Laplacian from rectangular to spherical coordinates

ANSWERSHEET (TOPIC = DIFFERENTIAL CALCULUS) COLLECTION #2. h 0 h h 0 h h 0 ( ) g k = g 0 + g 1 + g g 2009 =?

Main source: "Discrete-time systems and computer control" by Α. ΣΚΟΔΡΑΣ ΨΗΦΙΑΚΟΣ ΕΛΕΓΧΟΣ ΔΙΑΛΕΞΗ 4 ΔΙΑΦΑΝΕΙΑ 1

Second Order Partial Differential Equations

Homework for 1/27 Due 2/5

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

Lecture 34 Bootstrap confidence intervals

Areas and Lengths in Polar Coordinates

Optimal Parameter in Hermitian and Skew-Hermitian Splitting Method for Certain Two-by-Two Block Matrices

k A = [k, k]( )[a 1, a 2 ] = [ka 1,ka 2 ] 4For the division of two intervals of confidence in R +

Fractional Colorings and Zykov Products of graphs

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

Problem Set 3: Solutions

STAT200C: Hypothesis Testing

Inverse trigonometric functions & General Solution of Trigonometric Equations

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

Aquinas College. Edexcel Mathematical formulae and statistics tables DO NOT WRITE ON THIS BOOKLET

Mean-Variance Analysis

Finite Field Problems: Solutions

Reminders: linear functions

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

Concrete Mathematics Exercises from 30 September 2016

Coefficient Inequalities for a New Subclass of K-uniformly Convex Functions

DESIGN OF MACHINERY SOLUTION MANUAL h in h 4 0.

ω ω ω ω ω ω+2 ω ω+2 + ω ω ω ω+2 + ω ω+1 ω ω+2 2 ω ω ω ω ω ω ω ω+1 ω ω2 ω ω2 + ω ω ω2 + ω ω ω ω2 + ω ω+1 ω ω2 + ω ω+1 + ω ω ω ω2 + ω

Matrices and Determinants

Solutions to Exercise Sheet 5

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ.

5. Choice under Uncertainty

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

Asymptotic distribution of MLE

Probability and Random Processes (Part II)

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

Lecture 34: Ridge regression and LASSO

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 6/5/2006

forms This gives Remark 1. How to remember the above formulas: Substituting these into the equation we obtain with

Math221: HW# 1 solutions

Numerical Analysis FMN011

Queensland University of Technology Transport Data Analysis and Modeling Methodologies

CRASH COURSE IN PRECALCULUS

Parametrized Surfaces

SCITECH Volume 13, Issue 2 RESEARCH ORGANISATION Published online: March 29, 2018

Srednicki Chapter 55

Gaussian related distributions

Survival Analysis: One-Sample Problem /Two-Sample Problem/Regression. Lu Tian and Richard Olshen Stanford University

Lecture 15 - Root System Axiomatics

About these lecture notes. Simply Typed λ-calculus. Types

A Two-Sided Laplace Inversion Algorithm with Computable Error Bounds and Its Applications in Financial Engineering

2. THEORY OF EQUATIONS. PREVIOUS EAMCET Bits.

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review

Nowhere-zero flows Let be a digraph, Abelian group. A Γ-circulation in is a mapping : such that, where, and : tail in X, head in

Εργαστήριο Ανάπτυξης Εφαρμογών Βάσεων Δεδομένων. Εξάμηνο 7 ο

ORDINAL ARITHMETIC JULIAN J. SCHLÖDER

Tridiagonal matrices. Gérard MEURANT. October, 2008

The challenges of non-stable predicates

Lecture 2. Soundness and completeness of propositional logic

Appendix to Technology Transfer and Spillovers in International Joint Ventures

Jesse Maassen and Mark Lundstrom Purdue University November 25, 2013

Econ Spring 2004 Instructor: Prof. Kiefer Solution to Problem set # 5. γ (0)

If we restrict the domain of y = sin x to [ π, π ], the restrict function. y = sin x, π 2 x π 2

Transcript:

Lecture 12: Pseudo likelihood approach Pseudo MLE Let X 1,...,X n be a random sample from a pdf in a family indexed by two parameters θ and π with likelihood l(θ,π). The method of pseudo MLE may be viewed as follows. Based on the sample, an estimate π of π is obtained using some technique other than MLE. The pseudo MLE of θ is then obtained by maximizing the likelihood l(θ, π). Advantages of Pseudo MLE π is viewed as a nuisance parameter. Pseudo MLE consists of replacing π by an estimate and solving a reduced system of likelihood equations, which works when while a higher dimensional MLE is intractable but a lower dimensional MLE is feasible. The consistency and asymptotic normality hold under fairly standard regularity conditions. UW-Madison (Statistics) Math-Stat Lecture 12 Jan 2018 1 / 9

Lecture 12: Pseudo likelihood approach Pseudo MLE Let X 1,...,X n be a random sample from a pdf in a family indexed by two parameters θ and π with likelihood l(θ,π). The method of pseudo MLE may be viewed as follows. Based on the sample, an estimate π of π is obtained using some technique other than MLE. The pseudo MLE of θ is then obtained by maximizing the likelihood l(θ, π). Advantages of Pseudo MLE π is viewed as a nuisance parameter. Pseudo MLE consists of replacing π by an estimate and solving a reduced system of likelihood equations, which works when while a higher dimensional MLE is intractable but a lower dimensional MLE is feasible. The consistency and asymptotic normality hold under fairly standard regularity conditions. UW-Madison (Statistics) Math-Stat Lecture 12 Jan 2018 1 / 9

Lemma Let X 1,...,X n be iid random variables from a distribution F π on the real line, with π Π R. Let π 0 Π be the true value of parameter, and let π be a sample estimator such that π p π 0. Let ψ(x,π) be a differentiable function of π for π B, an open neighborhood of π 0, and for almost all x in the sample space. Suppose E ψ(x,π 0 ) <. If π ψ(x,π) M(x) for all π B, where E[M(X)] <, then 1 n n i=1 ψ(x i, π) p Eψ(X,π 0 ). Proof. Consider the Taylor series expansion of 1 n n i=1 ψ(x i, π). Notation f (x θ,π): the pdf of X 1 Φ(x θ,π) = logf (x θ,π) l n (θ,π) = n 1 n i=1 Φ(x i θ,π) UW-Madison (Statistics) Math-Stat Lecture 12 Jan 2018 2 / 9

Lemma Let X 1,...,X n be iid random variables from a distribution F π on the real line, with π Π R. Let π 0 Π be the true value of parameter, and let π be a sample estimator such that π p π 0. Let ψ(x,π) be a differentiable function of π for π B, an open neighborhood of π 0, and for almost all x in the sample space. Suppose E ψ(x,π 0 ) <. If π ψ(x,π) M(x) for all π B, where E[M(X)] <, then 1 n n i=1 ψ(x i, π) p Eψ(X,π 0 ). Proof. Consider the Taylor series expansion of 1 n n i=1 ψ(x i, π). Notation f (x θ,π): the pdf of X 1 Φ(x θ,π) = logf (x θ,π) l n (θ,π) = n 1 n i=1 Φ(x i θ,π) UW-Madison (Statistics) Math-Stat Lecture 12 Jan 2018 2 / 9

Notation Φ θ = Φ θ, Φ θπ = 2 Φ/ θ π, Φ θθπ = 3 Φ/ θ θ π, etc. Regularity Conditions (A1) The following partial derivatives exist Φ θ,φ θθ,φ θθθ,φ π,φ θπ,φ θθπ,φ θππ. (A2) Interchange of differentiation and integration of f is valid for first and second (partial) derivatives. (A3) φ 11 and φ 12 exist, where φ 11 = E θ0,π 0 [Φ θ ] 2 > 0 and φ 12 = E θ0,π 0 [Φ θ Φ π ]. (A4) log f (x θ 0,π) π f (x θ 0,π) M(x,θ) where EM(X,θ) <. (A5) The third partial derivatives Φ θθθ,φ θθπ,φ θππ are bounded by integrable functions M(x). (A6) For any (θ,π) (θ 0,π 0 ), P θ0,π 0 {f (X θ,π) = f (X θ 0,π 0 )} < 1. UW-Madison (Statistics) Math-Stat Lecture 12 Jan 2018 3 / 9

Theorem (Consistency of Pseudo MLE) Let X 1,...,X n be an iid sample with pdf f (x θ 0,π 0 ), and let π be a consistent estimator of π 0. Suppose (A1), (A4) and (A6) hold. Then θ p θ 0, where θ is a solution of θ l n(θ, π) = 0. Proof. By the lemma, l n (θ, π) l n (θ 0, π) p E θ0,π 0 log f (X θ,π 0) f (X θ 0,π 0 ) < loge f (X θ,π 0 ) θ 0,π 0 f (X θ 0,π 0 ) = 0, which means l n (θ, π) has a local maximum in (θ 0 ε,θ 0 + ε). There is a θ satisfying θ l n(θ, π) = 0, which completes the proof. Remark The above theorem establishes only that the pseudo MLE has a consistent root. In many applications, the pseudo maximum likelihood equation has a unique solution and the pseudo MLE is indeed consistent, because the logarithmic derivative of the pseudo likelihood in most cases is a decreasing function of the structural parameter. UW-Madison (Statistics) Math-Stat Lecture 12 Jan 2018 4 / 9

Theorem (Consistency of Pseudo MLE) Let X 1,...,X n be an iid sample with pdf f (x θ 0,π 0 ), and let π be a consistent estimator of π 0. Suppose (A1), (A4) and (A6) hold. Then θ p θ 0, where θ is a solution of θ l n(θ, π) = 0. Proof. By the lemma, l n (θ, π) l n (θ 0, π) p E θ0,π 0 log f (X θ,π 0) f (X θ 0,π 0 ) < loge f (X θ,π 0 ) θ 0,π 0 f (X θ 0,π 0 ) = 0, which means l n (θ, π) has a local maximum in (θ 0 ε,θ 0 + ε). There is a θ satisfying θ l n(θ, π) = 0, which completes the proof. Remark The above theorem establishes only that the pseudo MLE has a consistent root. In many applications, the pseudo maximum likelihood equation has a unique solution and the pseudo MLE is indeed consistent, because the logarithmic derivative of the pseudo likelihood in most cases is a decreasing function of the structural parameter. UW-Madison (Statistics) Math-Stat Lecture 12 Jan 2018 4 / 9

Theorem (Consistency of Pseudo MLE) Let X 1,...,X n be an iid sample with pdf f (x θ 0,π 0 ), and let π be a consistent estimator of π 0. Suppose (A1), (A4) and (A6) hold. Then θ p θ 0, where θ is a solution of θ l n(θ, π) = 0. Proof. By the lemma, l n (θ, π) l n (θ 0, π) p E θ0,π 0 log f (X θ,π 0) f (X θ 0,π 0 ) < loge f (X θ,π 0 ) θ 0,π 0 f (X θ 0,π 0 ) = 0, which means l n (θ, π) has a local maximum in (θ 0 ε,θ 0 + ε). There is a θ satisfying θ l n(θ, π) = 0, which completes the proof. Remark The above theorem establishes only that the pseudo MLE has a consistent root. In many applications, the pseudo maximum likelihood equation has a unique solution and the pseudo MLE is indeed consistent, because the logarithmic derivative of the pseudo likelihood in most cases is a decreasing function of the structural parameter. UW-Madison (Statistics) Math-Stat Lecture 12 Jan 2018 4 / 9

Theorem (Asymptotic Normality of Pseudo MLE) Let X 1,...,X n be an iid sample with pdf f (x θ 0,π 0 ), and let π be a sample estimator such that π π = O p (1/ n). Suppose further that ( ) Σ11 Σ n(ln (θ 0,π 0 ), π π) d N(0,Σ), Σ = 12 Σ 21 Σ 22 Then, under regularity conditions (A1)-(A6), a consistent pseudo MLE θ is asymptotically normal, n( θ θ0 )/σ d N(0,σ 2 ), where σ 2 = 1 + φ 12 (Σ 22 φ 12 2Σ 12 ). φ 11 φ 2 11 Remark If φ 22 = E θ0,π 0 [Φ π ] 2 exists and is positive, and if Σ 12 = 0 ( π is asymptotically equivalent to the MLE of π 0 ), then Σ 22 = φ 11 /(φ 11 φ 22 φ 2 12 ) and σ 2 = φ 22 /(φ 11 φ 22 φ 2 12 ). Thus, in this case θ is asymptotically efficient if φ 12 = 0. UW-Madison (Statistics) Math-Stat Lecture 12 Jan 2018 5 / 9

Theorem (Asymptotic Normality of Pseudo MLE) Let X 1,...,X n be an iid sample with pdf f (x θ 0,π 0 ), and let π be a sample estimator such that π π = O p (1/ n). Suppose further that ( ) Σ11 Σ n(ln (θ 0,π 0 ), π π) d N(0,Σ), Σ = 12 Σ 21 Σ 22 Then, under regularity conditions (A1)-(A6), a consistent pseudo MLE θ is asymptotically normal, n( θ θ0 )/σ d N(0,σ 2 ), where σ 2 = 1 + φ 12 (Σ 22 φ 12 2Σ 12 ). φ 11 φ 2 11 Remark If φ 22 = E θ0,π 0 [Φ π ] 2 exists and is positive, and if Σ 12 = 0 ( π is asymptotically equivalent to the MLE of π 0 ), then Σ 22 = φ 11 /(φ 11 φ 22 φ 2 12 ) and σ 2 = φ 22 /(φ 11 φ 22 φ 2 12 ). Thus, in this case θ is asymptotically efficient if φ 12 = 0. UW-Madison (Statistics) Math-Stat Lecture 12 Jan 2018 5 / 9

Proof. By Taylor s expansion, 0 = nl θ ( θ, π) = nl θ (θ 0, π) + n( θ θ 0 )l θθ (θ 0, π) + 1 2 n( θ θ0 ) 2 l θθθ ( θ, π) + o p (1), where θ lies between θ 0 and θ. Then nl θ (θ 0, π) = n( θ θ 0 ) ( l θθ (θ 0, π) + 1 2 ( θ θ 0 )l θθθ ( θ, π) ) + o p (1). Note that nlθ (θ 0, π) = nl θ (θ 0,π 0 ) n( π π 0 )φ 12 + o p (1). l θθ (θ 0, π) + φ 11 = o p (1). 1 2 ( θ θ 0 )l θθθ ( θ, π) = o p (1). We then see that n( θ θ 0 ) is asymptotically equivalent to ( nl θ0,π 0 n( π π 0 )φ 12 )/φ 11. The conclusion follows from the fact that Σ 11 = φ 11. UW-Madison (Statistics) Math-Stat Lecture 12 Jan 2018 6 / 9

Example: Signal Plus Noise Model Let X be a random variable whose distribution is that of the sum of independent variables Y and Z. The distribution of X is thus the convolution of F Y and F Z, and may be thought of as a model for signals in additive noise. Let X 1,...,X p be a random sample from Y + Z, where Y Poisson(θ 0 ) and Z B(N,p 0 ). The moment estimators of p 0 and θ 0 are p = ( X S 2 )/N and θ = X N p, where X and S 2 are the sample mean and variance, provided X 2 S 2, which occurs with probability tending to 1 as n. Moreover, by δ-method, we could derive that n ( ln (θ 0,p 0 ), p p 0 ) d N(0,Σ), where ( φ11 0 Σ = 0 t 2 (Γ 22 2Γ 23 + Γ 33 ) with t = 1/2Np 0, Γ 22 = θ 0 + Np 0 (1 p 0 ),Γ 23 = θ 0 + Np 0 (1 2p 0 ) and Γ 33 = θ 0 + 2 ( θ 0 + Np 0 (1 p 0 ) ) 2 + Np0 (1 p 0 ) ( 1 6p 0 (1 p 0 ) ) 2. ) UW-Madison (Statistics) Math-Stat Lecture 12 Jan 2018 7 / 9

Comparison of the asymptotic variances The asymptotic variances of the MLE, pseudo MLE and moment estimator of the signal parameter θ 0 : σmle 2 = φ 22 φ 11 φ 22 φ12 2, σpmle 2 = 1 + φ 12 2 t 2 (Γ 22 2Γ 23 + Γ 33 ), φ 11 φ 2 11 σ 2 MME =(1 Nt)2 Γ 22 + 2Nt(1 Nt)Γ 23 + (Nt) 2 Γ 33. It is not easy to compare these expressions analytically. For a specific range of parameters, we could find ARE(PMLE/MLE) <1, ARE(MME/PMLE) <1. This example lead us to conjecture the pseudo MLE of θ 0 is uniformly more efficient asymptotically than the moment estimator of θ. UW-Madison (Statistics) Math-Stat Lecture 12 Jan 2018 8 / 9

Discussions Pseudo MLE can be viewed as the process of solving a reduced system of likelihood equations which promises to improve the asymptotic behavior of estimates of specific parameters. The requirements on the model are slightly less stringent for pseudo MLE than for any other methods. The method of pseudo MLE may be applicable in situations where the standard theory for MLE breaks down. The main virtue of pseudo MLE s are their tractability in problems in which optimal approaches are computationally unfeasible. An iterative procedure based on the method of pseudo MLE may provide more efficient estimates than the single iteration we have discussed. Under regular conditions, the algorithm should converge to MLE, but the speed of this convergence may preclude its use. UW-Madison (Statistics) Math-Stat Lecture 12 Jan 2018 9 / 9