6. MAXIMUM LIKELIHOOD ESTIMATION

Σχετικά έγγραφα
Other Test Constructions: Likelihood Ratio & Bayes Tests

ST5224: Advanced Statistical Theory II

Statistical Inference I Locally most powerful tests

More Notes on Testing. Large Sample Properties of the Likelihood Ratio Statistic. Let X i be iid with density f(x, θ). We are interested in testing

2 Composition. Invertible Mappings

Theorem 8 Let φ be the most powerful size α test of H

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Exercises to Statistics of Material Fatigue No. 5

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

Introduction to the ML Estimation of ARMA processes

Solution Series 9. i=1 x i and i=1 x i.

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

6.3 Forecasting ARMA processes

Section 8.3 Trigonometric Equations

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

Lecture 7: Overdispersion in Poisson regression

Μηχανική Μάθηση Hypothesis Testing

Econ Spring 2004 Instructor: Prof. Kiefer Solution to Problem set # 5. γ (0)

Example Sheet 3 Solutions

Estimation for ARMA Processes with Stable Noise. Matt Calder & Richard A. Davis Colorado State University

HW 3 Solutions 1. a) I use the auto.arima R function to search over models using AIC and decide on an ARMA(3,1)

5.4 The Poisson Distribution.

C.S. 430 Assignment 6, Sample Solutions

Solutions to Exercise Sheet 5

= λ 1 1 e. = λ 1 =12. has the properties e 1. e 3,V(Y

Lecture 34 Bootstrap confidence intervals

Homework 3 Solutions

derivation of the Laplacian from rectangular to spherical coordinates

k A = [k, k]( )[a 1, a 2 ] = [ka 1,ka 2 ] 4For the division of two intervals of confidence in R +

ENGR 691/692 Section 66 (Fall 06): Machine Learning Assigned: August 30 Homework 1: Bayesian Decision Theory (solutions) Due: September 13

A Note on Intuitionistic Fuzzy. Equivalence Relation

Areas and Lengths in Polar Coordinates

( ) 2 and compare to M.

Differentiation exercise show differential equation

Lecture 21: Properties and robustness of LSE

Partial Differential Equations in Biology The boundary element method. March 26, 2013

Every set of first-order formulas is equivalent to an independent set

The Simply Typed Lambda Calculus

Aquinas College. Edexcel Mathematical formulae and statistics tables DO NOT WRITE ON THIS BOOKLET

Jesse Maassen and Mark Lundstrom Purdue University November 25, 2013

Matrices and Determinants

Exercises 10. Find a fundamental matrix of the given system of equations. Also find the fundamental matrix Φ(t) satisfying Φ(0) = I. 1.

Inverse trigonometric functions & General Solution of Trigonometric Equations

STAT200C: Hypothesis Testing

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

Problem Set 3: Solutions

Homework for 1/27 Due 2/5

Lecture 12: Pseudo likelihood approach

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

Homework 8 Model Solution Section

Parametrized Surfaces

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

SCITECH Volume 13, Issue 2 RESEARCH ORGANISATION Published online: March 29, 2018

Second Order Partial Differential Equations

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

Notes on the Open Economy

D Alembert s Solution to the Wave Equation

CHAPTER 101 FOURIER SERIES FOR PERIODIC FUNCTIONS OF PERIOD

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

Exercise 2: The form of the generalized likelihood ratio

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

4.6 Autoregressive Moving Average Model ARMA(1,1)

Lecture 17: Minimum Variance Unbiased (MVUB) Estimators

Congruence Classes of Invertible Matrices of Order 3 over F 2

Math 6 SL Probability Distributions Practice Test Mark Scheme

( y) Partial Differential Equations

Numerical Analysis FMN011

Areas and Lengths in Polar Coordinates

Approximation of distance between locations on earth given by latitude and longitude

Fourier Series. MATH 211, Calculus II. J. Robert Buchanan. Spring Department of Mathematics

Last Lecture. Biostatistics Statistical Inference Lecture 19 Likelihood Ratio Test. Example of Hypothesis Testing.

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

An Introduction to Signal Detection and Estimation - Second Edition Chapter II: Selected Solutions

Probability and Random Processes (Part II)

w o = R 1 p. (1) R = p =. = 1

Section 7.6 Double and Half Angle Formulas

Section 9.2 Polar Equations and Graphs

Concrete Mathematics Exercises from 30 September 2016

FORMULAS FOR STATISTICS 1

Reminders: linear functions

Empirical best prediction under area-level Poisson mixed models

Multi-dimensional Central Limit Theorem

If we restrict the domain of y = sin x to [ π, π ], the restrict function. y = sin x, π 2 x π 2

ω ω ω ω ω ω+2 ω ω+2 + ω ω ω ω+2 + ω ω+1 ω ω+2 2 ω ω ω ω ω ω ω ω+1 ω ω2 ω ω2 + ω ω ω2 + ω ω ω ω2 + ω ω+1 ω ω2 + ω ω+1 + ω ω ω ω2 + ω

EE512: Error Control Coding

b. Use the parametrization from (a) to compute the area of S a as S a ds. Be sure to substitute for ds!

The challenges of non-stable predicates

Lecture 2. Soundness and completeness of propositional logic

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ.

ORDINAL ARITHMETIC JULIAN J. SCHLÖDER

If we restrict the domain of y = sin x to [ π 2, π 2

ΗΜΥ 220: ΣΗΜΑΤΑ ΚΑΙ ΣΥΣΤΗΜΑΤΑ Ι Ακαδημαϊκό έτος Εαρινό Εξάμηνο Κατ οίκον εργασία αρ. 2

1. A fully continuous 20-payment years, 30-year term life insurance of 2000 is issued to (35). You are given n A 1

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review

Mean-Variance Analysis

Multi-dimensional Central Limit Theorem

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 24/3/2007

Transcript:

6 MAXIMUM LIKELIHOOD ESIMAION [1] Maximum Likelihood Estimator (1) Cases in which θ (unknown parameter) is scalar Notational Clarification: From now on, we denote the true value of θ as θ o hen, view θ as a variable Definition: (Likelihood function) Let {x 1,, x } be a sample from a population It does not have to be a random sample x t is a scalar Let f(x 1,x,, x,θ o ) be the joint density function of x 1,, x he functional form of f is known, but not θ o hen, L (θ) f(x 1,, x, θ) is called likelihood function L (θ) is a function of θ given x 1,, x he functional form of f is known, but not θ o Definition: (log-likelihood function) l (θ) = ln[f(x 1,, x,θ)] MLE-1

Example: {x 1,, x }: a random sample from a population distributed with f(x,θ o ) f(x 1,, x, θ o ) = t = 1 f (, θ ) x t o L (θ) = f(x 1,, x, θ) = t = t = 1 l (θ) = ( f ( xt, )) 1 f ( x, θ ) ln θ = Σ ln f (, θ ) t t x t Definition: (Maximum Likelihood Estimator (MLE)) MLE θˆ MLE maximizes l (θ) given data points x 1,, x Example: {x 1,, x } is a random sample from a population following a Poisson distribution [ie, f(x,θ) = e -θ θ x /x! (suppressing subscript o from θ)] Note that E(x) = var(x) = θ o for Poisson distribution l (θ) = Σ t ln[f(x t,θ)] = -θ + (ln(θ))σ t x t - Σ t x t! 1 FOC of max: / θ = + Σtxt = 0 θ Solving this, θˆ MLE = Σ t x t = x MLE-

() Extension to the Cases with Multiple Parameters Definition: θ = [θ 1,θ,, θ p ] L (θ) = f(x 1,, x,θ) = f(x 1,, x, θ 1,, θ p ) l (θ) = ln[f(x 1,, x,θ) = ln[f(x 1,, x, θ 1,, θ p )] x t could be a vector If {x 1,, x } is a random sample from a population with f(x,θ o ), t = 1 l (θ) = ( f ( xt, )) ln θ = Σ ln f (, θ ) t x t Definition: (MLE) MLE θˆ MLE maximizes l (θ) given data (vector) points x 1,, x hat is, θˆ MLE solves ( θ ) θ = ( θ ) / θ1 0 ( θ ) / θ 0 = : : ( θ ) / θ 0 p p 1 Example: Let {x 1,, x } be a random sample from N(µ,σ ) [suppressing subscript o ] Since {x 1,, x } is a random sample, E(x t ) = µ o and var(x t ) = σ o Let θ = (µ,v), where v = σ MLE-3

1 ( xt µ ) f( xt, θ ) = exp v π v 1/ 1/ ( xt µ ) = ( π ) ( v) exp v 1 1 ( xt µ ) ln[ f( xt, θ)] = ln( π) ln( v) v Σt( xt ) ( θ) = ln( π) ln( v) v MLE solves FOC: ( θ ) 1 Σt( xt µ ) (1) = Σt ( xt )( 1) = = 0; µ v v ( θ ) Σt( xt ) () = + = 0 v v v From (1): Σ x (3) Σ t( x t µ ) = 0 Σ t x t - µ = 0 µˆ t t MLE = = x Substituting (3) in to (): (4) -v + Σ t (x t - µˆ MLE ) = 0 hus, vˆ 1 MLE = Σt( xt x) ˆ ˆ µ x MLE θ = = 1 MLE vˆ Σt( xt x) MLE MLE-4

[] Large Sample Properties of the ML estimator Definition: 1) Let g(θ) = g(θ 1,, θ p ) be a scalar function of θ Let g j = g/ θ j hen, g1 g g = θ : g p ) Let w(θ) =(w 1 (θ),, w m (θ)) be a m 1 vector of functions of θ Let w ij = w i (θ)/ θ j hen, w11 w1 w1 p ( ) w w w w θ = θ : : : w w w 1 p m1 m mp m p 3) Let g(θ) be a scalar function of θ where g ij = g(θ)/ θ i θ j hen, g11 g1 g1 p g g g = θθ : : : g g g g( θ ) 1 p Called Hessian matrix of g(θ) p1 p pp p p MLE-5

Example 1: Let g(θ) = θ 1 + θ + θ 1 θ Find g(θ)/ θ g( θ ) = θ θ + θ 1 θ θ + 1 Example : Let θ1 + θ w( θ ) = θ1+ θ w( θ ) θ 1 θ 1 = 1 θ Example 3: Let g(θ) = θ 1 + θ + θ 1 θ Find the Hessian matrix of g(θ) g( θ ) 1 = θθ 1 Some useful results: 1) c : 1 p, θ: p 1 (c θ is a scalar) (c θ)/ θ = c ; (c θ)/ θ = c ) R: m p, θ: p 1 (Rθ is m 1) (Rθ)/ θ = R 3) A: p p symmetric, θ: p 1 (θ Aθ) (θ Aθ)/ θ = Aθ (θ'aθ)/ θ = θ'a (θ Aθ)/ θ θ = A MLE-6

Definition: (Hessian matrix of log-likelihood function) H l l ( θ ) = = θθ θi θ j p p heorem: Let ˆ θ be MLE hen, under suitable regularity conditions, ˆ θ is consistent, and, 1 ˆ 1 ( θ θo) d N0 p 1, plim H( θo) Further, ˆ θ is asymptotically efficient Implication: ˆ θ N(θ o, [-H (θ o )] -1 ) ˆ θ N(θ o, [-H ( ˆ θ )] -1 ) Example: {x 1,, x } is a random sample from N(µ o,σ o ) Let θ = [µ,v] and v =σ 1 1 l v x v = ln( π ) ln( ) Σt( t µ ) he first derivatives: l ( θ ) Σ ( x ) l ( θ ) 1 µ v v v v t t = ; = + Σ ( ) t xt MLE-7

he second derivatives: l ( θ ) 1 = Σt( 1) = µµ v v ; l( θ ) Σt( xt ) = ; µ v v l ( θ ) 0 v 1 4v 1 vv v ( v) v v = + Σ ( ) ( ) t xt µ = Σ 3 t xt herefore, Hence, Σt( xt ) v ν H ( θ ) = Σt( xt ) Σt( xt ) + 3 ν ν v 0 vˆ ( ˆ ML H θml) = 0 vˆ ML vˆ ML 0 ˆ ˆ µ ML µ o θ = N, v ˆ ML v o vˆ ML 0 MLE-8

[3] esting Hypotheses Based on MLE General form of hypotheses: Let w(θ) = [w 1 (θ),w (θ),, w m (θ)], where w j (θ) = w j (θ 1, θ,, θ p ) = a function of θ 1,, θ p H o : he true θ (θ o ) satisfies the m restrcitions, w(θ) = 0 m 1 (m p) Definition: (Restricted MLE) Let θ be the restricted ML estimator which maximizes l (θ) st w(θ) = 0 Wald est: ˆ ˆ ˆ ˆ 1 ˆ W = w( θ)'[ W( θ) Cov( θ) W( θ)] w( θ) If ˆ θ is a (unrestricted) ML estimator, ˆ ˆ ˆ 1 ˆ 1 ˆ W = w( θ)[ W( θ){ H ( θ)} W( θ)] w( θ) Note: Can be computed with any consistent estimator ˆ θ and Cov( ˆ θ ) Likelihood Ratio est: (LR) LR = [l ( ˆ θ ) - l (θ )] Lagrangean Multiplier (LM) test Define s l ( θ ) ( θ ) = hen, LM = s (θ ) [-H (θ )] -1 s (θ ) θ MLE-9

heorem: Under H o : w(θ) = 0, W, LR, LM d χ (m) Implication: Given significance level (α), find a critical value from χ table Usually, α = 005 or α = 001 If W > c, reject H o Otherwise, do not reject H o Comments: 1) Wald needs only ˆ θ ; LR needs both ˆ θ and θ ; and LM needs θ only ) In general, W LR LM 3) W is not invariant to how to write restrictions hat is, W for H o : θ 1 = θ may not be equal to W for H o : θ 1 /θ = 1 Example: (1) {x 1,, x }: RS from N(µ o,v o ) with v o known So, θ = µ H o : µ = 0 w(µ) = µ l (µ) = -(/)ln(π) - (/)ln(v o ) - {1/(v o )}Σ t (x t -µ) s (µ) = (1/v o )Σ t (x t -µ) H ( µ ) = v o MLE-10

[Wald est] Unrestricted MLE: FOC: l (µ)/ µ = (1/v)Σ t (x t -µ) = 0 ˆ µ = x W(µ) = 1 W( ˆµ ) = 1 -H ( ˆµ ) = /v o [LR est] Restricted MLE: µ = 0 l ( ˆµ ) = -(/)ln(π) - (/)ln(v o ) - {1/(v o )}Σ t (x t - x ) l (µ ) = -(/)ln(π) - (/)ln(v o )- {1/(v o )}Σ t x t [LM est] s (µ ) = (1/v o )Σ t x t = (/v o ) x ; I ( µ ) = /v o With this information, can show that W = LR = LM = x v o () Both µ and v unknown: θ = (µ,v) H o : µ = 0 w(θ) = µ W(θ) = w(θ)/ θ = [ µ/ µ, µ/ v] = [1, 0] l (θ) = -(/)ln(π) - (/)ln(v) - {1/(v)}Σ t (x t -µ) MLE-11

s (θ) = 1 Σt( xt ) v 1 + Σ ( ) t xt v v ; Σt( xt ) v ν H ( θ ) = Σt( xt ) Σt( xt ) + 3 ν ν v Unrestricted MLE: ˆ µ = x and 1 vˆ ( x x) = Σt t Restricted MLE: µ = 0, but need to compute v l ( µ,v) = -(/)ln(π) - (/)ln(v) - {1/(v)}Σ t (x t - µ ) l (0,v) = -(/)ln(π) - (/)ln(v) - {1/(v)}Σ t x t FOC: l (0,v)/ v = -/(v) + (1/(v ))/Σ t x t = 0 v = (1/)Σ t x t [Wald est] w( ˆ θ ) = ˆµ = x ; W( ˆ θ ) = ( 1 0 ); -H ( ˆ θ ) = vˆ 0 0 vˆ W = w( ˆ θ ) [W( ˆ θ ){I ( ˆ θ )} -1 W( ˆ θ ) ] -1 w( ˆ θ ) = x v ˆ MLE-1

[LR est] l ( ˆ θ ) = -(/)ln(π) - (/)ln( ˆv ) - {1/( ˆv )}Σ t (x t - x ) l (θ ) = -(/)ln(π) - (/)ln(v ) - {1/(v )}Σ t x t [LM est] s 1 x Σtx x t v v ( θ ) = = = v ; 1 + Σ 0 tx t + v v v v Σtxt v ν H( θml) = Σtxt ν ν LM = 1 x s ( θ)[ I( θ)] s( θ) = v x MLE-13

[4] Efficiency of OLS estimator under Ideal Conditions Assume that y t is iid N(x t β,v) conditional on x t f(y t x t,β,v) = 1 1 exp ( y ) π v v ti xti β l( β, v) =Σtln f( yt β, v, xti) 1 = ln( π) ln v Σ( y x v ) 1 = ln( π ) ln v ( y Xβ) ( y Xβ) v t t ti β herefore, we have the following likelihood function of y FOC: (i) l (β,v)/ β = -(1/v)[-X y + X Xβ] = 0 k 1 (ii) l (β,v)/ v = -(/v) + (1/v )(y-xβ) (y-xβ) = 0 From (i), X y - X Xβ = 0 k 1 From (ii), vˆ MLE = SSE/ βˆ MLE = (X X) -1 X y = βˆ hus, we can conclude that ˆ β and s = SSE/(-k) are asymptotically efficient MLE-14