Asymptotic distribution of MLE

Σχετικά έγγραφα
Other Test Constructions: Likelihood Ratio & Bayes Tests

Introduction to the ML Estimation of ARMA processes

Estimation for ARMA Processes with Stable Noise. Matt Calder & Richard A. Davis Colorado State University

4.6 Autoregressive Moving Average Model ARMA(1,1)

6.3 Forecasting ARMA processes

HW 3 Solutions 1. a) I use the auto.arima R function to search over models using AIC and decide on an ARMA(3,1)

Solution Series 9. i=1 x i and i=1 x i.

Statistical Inference I Locally most powerful tests

ST5224: Advanced Statistical Theory II

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

Module 5. February 14, h 0min

Durbin-Levinson recursive method

Homework for 1/27 Due 2/5

EE512: Error Control Coding

Approximation of distance between locations on earth given by latitude and longitude

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review

Lecture 34 Bootstrap confidence intervals

ΕΙΣΑΓΩΓΗ ΣΤΗ ΣΤΑΤΙΣΤΙΚΗ ΑΝΑΛΥΣΗ

Finite Field Problems: Solutions

Numerical Analysis FMN011

Concrete Mathematics Exercises from 30 September 2016

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

Homework 3 Solutions

2 Composition. Invertible Mappings

C.S. 430 Assignment 6, Sample Solutions

Every set of first-order formulas is equivalent to an independent set

6. MAXIMUM LIKELIHOOD ESTIMATION

Figure A.2: MPC and MPCP Age Profiles (estimating ρ, ρ = 2, φ = 0.03)..

Example Sheet 3 Solutions

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

Second Order Partial Differential Equations

LAD Estimation for Time Series Models With Finite and Infinite Variance

Matrices and Determinants

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

derivation of the Laplacian from rectangular to spherical coordinates

Μηχανική Μάθηση Hypothesis Testing

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Partial Differential Equations in Biology The boundary element method. March 26, 2013

Lecture 21: Properties and robustness of LSE

5.4 The Poisson Distribution.

Lecture 12: Pseudo likelihood approach

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

Theorem 8 Let φ be the most powerful size α test of H

w o = R 1 p. (1) R = p =. = 1

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ.

Inverse trigonometric functions & General Solution of Trigonometric Equations

Math221: HW# 1 solutions

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

Probability and Random Processes (Part II)

2. ARMA 1. 1 This part is based on H and BD.

Section 8.3 Trigonometric Equations

An Inventory of Continuous Distributions

Solutions to Exercise Sheet 5

MAT Winter 2016 Introduction to Time Series Analysis Study Guide for Midterm

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

Reminders: linear functions

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Queensland University of Technology Transport Data Analysis and Modeling Methodologies

Finite difference method for 2-D heat equation

Exercises to Statistics of Material Fatigue No. 5

Partial Trace and Partial Transpose

CHAPTER 48 APPLICATIONS OF MATRICES AND DETERMINANTS

Math 6 SL Probability Distributions Practice Test Mark Scheme

Main source: "Discrete-time systems and computer control" by Α. ΣΚΟΔΡΑΣ ΨΗΦΙΑΚΟΣ ΕΛΕΓΧΟΣ ΔΙΑΛΕΞΗ 4 ΔΙΑΦΑΝΕΙΑ 1

Congruence Classes of Invertible Matrices of Order 3 over F 2

Aquinas College. Edexcel Mathematical formulae and statistics tables DO NOT WRITE ON THIS BOOKLET

Exercises 10. Find a fundamental matrix of the given system of equations. Also find the fundamental matrix Φ(t) satisfying Φ(0) = I. 1.

Fourier Series. MATH 211, Calculus II. J. Robert Buchanan. Spring Department of Mathematics

CRASH COURSE IN PRECALCULUS

Jesse Maassen and Mark Lundstrom Purdue University November 25, 2013

Lecture 2. Soundness and completeness of propositional logic

Practice Exam 2. Conceptual Questions. 1. State a Basic identity and then verify it. (a) Identity: Solution: One identity is csc(θ) = 1

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

Stationary ARMA Processes

5. Choice under Uncertainty

Generalized additive models in R

5. Partial Autocorrelation Function of MA(1) Process:


ESTIMATION OF SYSTEM RELIABILITY IN A TWO COMPONENT STRESS-STRENGTH MODELS DAVID D. HANAGAL

Optimal Parameter in Hermitian and Skew-Hermitian Splitting Method for Certain Two-by-Two Block Matrices

Lecture 7: Overdispersion in Poisson regression

Divergence for log concave functions

2. Let H 1 and H 2 be Hilbert spaces and let T : H 1 H 2 be a bounded linear operator. Prove that [T (H 1 )] = N (T ). (6p)

Overview. Transition Semantics. Configurations and the transition relation. Executions and computation

Calculating the propagation delay of coaxial cable

10.7 Performance of Second-Order System (Unit Step Response)

ES440/ES911: CFD. Chapter 5. Solution of Linear Equation Systems

Econ Spring 2004 Instructor: Prof. Kiefer Solution to Problem set # 5. γ (0)

Tridiagonal matrices. Gérard MEURANT. October, 2008

Nowhere-zero flows Let be a digraph, Abelian group. A Γ-circulation in is a mapping : such that, where, and : tail in X, head in

Tutorial on Multinomial Logistic Regression

CHAPTER 101 FOURIER SERIES FOR PERIODIC FUNCTIONS OF PERIOD

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 24/3/2007

ANSWERSHEET (TOPIC = DIFFERENTIAL CALCULUS) COLLECTION #2. h 0 h h 0 h h 0 ( ) g k = g 0 + g 1 + g g 2009 =?

Supplementary Appendix

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 6/5/2006

Uniform Convergence of Fourier Series Michael Taylor

Transcript:

Asymptotic distribution of MLE Theorem Let {X t } be a causal and invertible ARMA(p,q) process satisfying Φ(B)X = Θ(B)Z, {Z t } IID(0, σ 2 ). Let ( ˆφ, ˆϑ) the values that minimize LL n (φ, ϑ) among those yielding a causal and invertible ARMA process 19 novembre 2014 1 / 15

Asymptotic distribution of MLE Theorem Let {X t } be a causal and invertible ARMA(p,q) process satisfying Φ(B)X = Θ(B)Z, {Z t } IID(0, σ 2 ). Let ( ˆφ, ˆϑ) the values that minimize LL n (φ, ϑ) among those yielding a causal and invertible ARMA process, and let ˆσ 2 = S( ˆφ, ˆϑ). Then n n 1/2 (( ˆφ, ˆϑ) (φ, ϑ)) = N(0, W ) and ˆσ 2 a.s. σ 2 19 novembre 2014 1 / 15

Asymptotic distribution of MLE Theorem Let {X t } be a causal and invertible ARMA(p,q) process satisfying Φ(B)X = Θ(B)Z, {Z t } IID(0, σ 2 ). Let ( ˆφ, ˆϑ) the values that minimize LL n (φ, ϑ) among those yielding a causal and invertible ARMA process, and let ˆσ 2 = S( ˆφ, ˆϑ). Then n n 1/2 (( ˆφ, ˆϑ) (φ, ϑ)) = N(0, W ) and ˆσ 2 a.s. σ 2 ( W = σ 2 E(Ut U t t) E(U t V t ) 1 U t t) E(V t U t t) E(V t V t with U t =. V t =. t) U t p+1 (Φ(B)U) t = Z t (Θ(B)V ) t = Z t. V t V t q+1 19 novembre 2014 1 / 15

Asymptotic distribution of MLE: examples {X t } AR(p) Then W = σ 2 (E(U t U t t)) 1 = σ 2 Γ 1 p. 19 novembre 2014 2 / 15

Asymptotic distribution of MLE: examples {X t } AR(p) Then W = σ 2 (E(U t U t t)) 1 = σ 2 Γ 1 p. Hence ˆφ N(φ, σ2 n Γ 1 p ) for n large. For p = 1, ˆϕ N(ϕ, 1 n (1 ϕ2 )). 19 novembre 2014 2 / 15

Asymptotic distribution of MLE: examples {X t } AR(p) Then W = σ 2 (E(U t U t t)) 1 = σ 2 Γ 1 p. Hence ˆφ N(φ, σ2 n Γ 1 p ) for n large. For p = 1, ˆϕ N(ϕ, 1 n (1 ϕ2 )). {X t } MA(q) Then W = σ 2 (E(V t V t t)) 1 = σ 2 (Γ q) 1 where Γ q is the covariance of the AR(q) process (Θ(B)V ) t = Z t. 19 novembre 2014 2 / 15

Asymptotic distribution of MLE: examples {X t } AR(p) Then W = σ 2 (E(U t U t t)) 1 = σ 2 Γ 1 p. Hence ˆφ N(φ, σ2 n Γ 1 p ) for n large. For p = 1, ˆϕ N(ϕ, 1 n (1 ϕ2 )). {X t } MA(q) Then W = σ 2 (E(V t V t t)) 1 = σ 2 (Γ q) 1 where Γ q is the covariance of the AR(q) process (Θ(B)V ) t = Z t. For q = 1, ˆϑ N(ϑ, 1 n (1 ϑ2 )). 19 novembre 2014 2 / 15

Asymptotic distribution of MLE: examples {X t } AR(p) Then W = σ 2 (E(U t U t t)) 1 = σ 2 Γ 1 p. Hence ˆφ N(φ, σ2 n Γ 1 p ) for n large. For p = 1, ˆϕ N(ϕ, 1 n (1 ϕ2 )). {X t } MA(q) Then W = σ 2 (E(V t V t t)) 1 = σ 2 (Γ q) 1 where Γ q is the covariance of the AR(q) process (Θ(B)V ) t = Z t. For q = 1, ˆϑ N(ϑ, 1 n (1 ϑ2 )). ( ) E(U {X t } ARMA(1, 1) W = σ 2 2 1 t ) E(U t V t ) E(U t V t E(Vt 2 = ) ( (1 ϕ 2 ) 1 (1 + ϕϑ) 1 ) 1 (1 + ϕϑ) 1 (1 ϑ 2 ) 1. 19 novembre 2014 2 / 15

Asymptotic distribution of MLE: examples {X t } AR(p) Then W = σ 2 (E(U t U t t)) 1 = σ 2 Γ 1 p. Hence ˆφ N(φ, σ2 n Γ 1 p ) for n large. For p = 1, ˆϕ N(ϕ, 1 n (1 ϕ2 )). {X t } MA(q) Then W = σ 2 (E(V t V t t)) 1 = σ 2 (Γ q) 1 where Γ q is the covariance of the AR(q) process (Θ(B)V ) t = Z t. For q = 1, ˆϑ N(ϑ, 1 n (1 ϑ2 )). ( ) E(U {X t } ARMA(1, 1) W = σ 2 2 1 t ) E(U t V t ) E(U t V t E(Vt 2 = ) ( (1 ϕ 2 ) 1 (1 + ϕϑ) 1 ) 1 (1 + ϕϑ) 1 (1 ϑ 2 ) 1. One easily obtains the asymptotic variance of ( ˆφ, ˆϑ). 19 novembre 2014 2 / 15

Comparison of estimators Estimator Method of Fitted Parameter moments innovations MLE AR(1) ϕ ˆρ(1) ˆϑ m,1 MA(1) ϑ 1 1 4(ˆρ(1)) 2 2ˆρ(1) ˆϑ m,1 ARMA(1,1) ϕ ˆρ(2)/ˆρ(1) ˆϑ m,2 / ˆϑ m,1 ϑ ugly expression ˆϑ m,1 ˆϑ m,2 / ˆϑ m,1 if ˆρ(1) 1. 19 novembre 2014 3 / 15

Comparison of estimators Asymptotic variance Method of Fitted Par. moments innovations MLE (1) (2) (3) 1 AR(1) ϕ n (1 ϕ2 1 1 ) n n (1 ϕ2 ) 1 1+ϑ MA(1) ϑ +4ϑ 4 +ϑ 6 +ϑ 8 1 1 n (1 ϑ 2 ) 2 n n (1 ϑ2 ) 1 ARMA(1,1) ϕ n ϑ 1 n (1+ϕϑ) 2 (1 ϕ 2 ) (ϕ+ϑ) 2 (1+ϕϑ) 2 (1 ϑ 2 ) (ϕ+ϑ) 2 19 novembre 2014 4 / 15

Comparison of estimators Asymptotic variance Method of Fitted Par. moments innovations MLE (1) (2) (3) 1 AR(1) ϕ n (1 ϕ2 1 1 ) n n (1 ϕ2 ) 1 1+ϑ MA(1) ϑ +4ϑ 4 +ϑ 6 +ϑ 8 1 1 n (1 ϑ 2 ) 2 n n (1 ϑ2 ) 1 ARMA(1,1) ϕ n ϑ 1 n (1+ϕϑ) 2 (1 ϕ 2 ) (ϕ+ϑ) 2 (1+ϕϑ) 2 (1 ϑ 2 ) (ϕ+ϑ) 2 Relative asymptotic efficiency e(ϑ) of (asymptotically unbiased) estimators of the parameter ϑ: ratio of asymptotic variances. 0.82 ϑ = 0.25 0.94 ϑ = 0.25 e(ϑ; 1, 2) = 0.37 ϑ = 0.5 e(ϑ; 2, 3) = 0.75 ϑ = 0.5 0.06 ϑ = 0.75 0.44 ϑ = 0.75 19 novembre 2014 4 / 15

A tool to compute asymptotic variances Theorem Let σ 2 n 0 and X n µ σ n = N(0, 1). Then g(x n ) g(µ) σ n = N(0, (g (µ) 2 ) i.e. g(x n ) N(g(µ), (g (µ)) 2 σ 2 n). Let X n k-dim., g : R k R m and X n µ σ n = N(0, V ). Let (DVD) ii > 0 where D ij = g i x j (µ). Then g(x n ) g(µ) σ n = N(0, DVD t ) i.e. g(x n ) N(g(µ), DVD t σ 2 n). 19 novembre 2014 5 / 15

Model choice: introduction MLE provides estimates for any given model, e.g. ARMA(p,q). How do we choose? The residuals should resemble a white noise. Residuals can be defined as Ŵ t = (X t ˆX t ( ˆφ, ˆϑ))((r t 1 ( ˆφ, ˆϑ)) 1/2. (X t ˆX t (φ, ϑ))((r t 1 (φ, ϑ)) 1/2 is a white-noise sequence, and Ŵ t should be close. This can be tested, e.g. by computing the ACF of {Ŵt}. To avoid overfitting, the order can be selected through a criterion. 19 novembre 2014 6 / 15

Model choice: FPE criterion FPE (Final Prediction Error) is an estimate of the one-step prediction error (in L 2 norm) for an independent realization of the observed process. Assume (X 1,..., X n ) a realization of a causal AR(p) process with coefficients ϕ 1,..., ϕ n and (Y 1,..., Y n ) an independent realization of the same. 19 novembre 2014 7 / 15

Model choice: FPE criterion FPE (Final Prediction Error) is an estimate of the one-step prediction error (in L 2 norm) for an independent realization of the observed process. Assume (X 1,..., X n ) a realization of a causal AR(p) process with coefficients ϕ 1,..., ϕ n and (Y 1,..., Y n ) an independent realization of the same. The mean-square prediction error is FPE = E(Y n+1 ˆϕ 1 Y n ˆϕ n Y n+1 p ) 2 19 novembre 2014 7 / 15

Model choice: FPE criterion FPE (Final Prediction Error) is an estimate of the one-step prediction error (in L 2 norm) for an independent realization of the observed process. Assume (X 1,..., X n ) a realization of a causal AR(p) process with coefficients ϕ 1,..., ϕ n and (Y 1,..., Y n ) an independent realization of the same. The mean-square prediction error is FPE = E(Y n+1 ˆϕ 1 Y n ˆϕ n Y n+1 p ) 2 p p = E Y n+1 ϕ j Y n+1 j ( ˆϕ j ϕ j )Y n+1 j j=1 j=1 2 19 novembre 2014 7 / 15

Model choice: FPE criterion FPE (Final Prediction Error) is an estimate of the one-step prediction error (in L 2 norm) for an independent realization of the observed process. Assume (X 1,..., X n ) a realization of a causal AR(p) process with coefficients ϕ 1,..., ϕ n and (Y 1,..., Y n ) an independent realization of the same. The mean-square prediction error is FPE = E(Y n+1 ˆϕ 1 Y n ˆϕ n Y n+1 p ) 2 p p = E Y n+1 ϕ j Y n+1 j ( ˆϕ j ϕ j )Y n+1 j j=1 Y n+1 p j=1 ϕ jy n+1 j = Z n+1 is independent of other terms, and has variance σ 2. Furthermore, ˆφ φ is independent of {Y t }. j=1 2 19 novembre 2014 7 / 15

Model choice: FPE criterion FPE (Final Prediction Error) is an estimate of the one-step prediction error (in L 2 norm) for an independent realization of the observed process. Assume (X 1,..., X n ) a realization of a causal AR(p) process with coefficients ϕ 1,..., ϕ n and (Y 1,..., Y n ) an independent realization of the same. The mean-square prediction error is FPE = E(Y n+1 ˆϕ 1 Y n ˆϕ n Y n+1 p ) 2 p p = E Y n+1 ϕ j Y n+1 j ( ˆϕ j ϕ j )Y n+1 j j=1 Y n+1 p j=1 ϕ jy n+1 j = Z n+1 is independent of other terms, and has variance σ 2. Furthermore, ˆφ φ is independent of {Y t }. Hence p FPE = σ 2 + E(( ˆϕ j ϕ j )( ˆϕ i ϕ i ))E(Y n+1 j Y n+1 i ) i,j=1 j=1 = σ 2 + E( Γ p ( ˆφ φ), ˆφ φ ). 2 19 novembre 2014 7 / 15

FPE criterion: estimation FPE = σ 2 + E( Γ p ( ˆφ φ), ˆφ φ ). 19 novembre 2014 8 / 15

FPE criterion: estimation FPE = σ 2 + E( Γ p ( ˆφ φ), ˆφ φ ). Fact: if X is an n-dimensional random vector with V(X ) = S and A is an n n matrix, then E( AX, X ) = tr(as). 19 novembre 2014 8 / 15

FPE criterion: estimation FPE = σ 2 + E( Γ p ( ˆφ φ), ˆφ φ ). Fact: if X is an n-dimensional random vector with V(X ) = S and A is an n n matrix, then E( AX, X ) = tr(as). Furthermore, it was stated V( ˆφ φ) σ2 n Γ 1 p ) for n large. 19 novembre 2014 8 / 15

FPE criterion: estimation FPE = σ 2 + E( Γ p ( ˆφ φ), ˆφ φ ). Fact: if X is an n-dimensional random vector with V(X ) = S and A is an n n matrix, then E( AX, X ) = tr(as). Furthermore, it was stated V( ˆφ φ) σ2 tr(γ p Γ 1 p ) = p, FPE σ 2 (1 + p n ). n Γ 1 p ) for n large. As 19 novembre 2014 8 / 15

FPE criterion: estimation FPE = σ 2 + E( Γ p ( ˆφ φ), ˆφ φ ). Fact: if X is an n-dimensional random vector with V(X ) = S and A is an n n matrix, then E( AX, X ) = tr(as). Furthermore, it was stated V( ˆφ φ) σ2 n Γ 1 p ) for n large. As tr(γ p Γ 1 p ) = p, FPE σ 2 (1 + p n ). Replacing σ 2 n by the estimator ˆσ n p 2, one finally obtains the quantity that should be minimized. ( ) n + p ˆσ 2 n p 19 novembre 2014 8 / 15

FPE criterion: estimation FPE = σ 2 + E( Γ p ( ˆφ φ), ˆφ φ ). Fact: if X is an n-dimensional random vector with V(X ) = S and A is an n n matrix, then E( AX, X ) = tr(as). Furthermore, it was stated V( ˆφ φ) σ2 n Γ 1 p ) for n large. As tr(γ p Γ 1 p ) = p, FPE σ 2 (1 + p n ). Replacing σ 2 n by the estimator ˆσ n p 2, one finally obtains the quantity ( ) n + p ˆσ 2 n p that should be minimized. Increasing p will generally decrease ˆσ 2, but will be penalized by the other factor. 19 novembre 2014 8 / 15

Use of FPE on lake data for (ord in 1:4) Model ˆσ2 FPE { armle = AR(1) 0.4972 0.5075 ar.mle(huron2, AR(2) 0.4571 0.4762 order=ord,aic=f) AR(3) 0.4557 0.4845 print(armle) AR(4) 0.4573 0.4962 #coefficients and sigma Model ϕ 1 ϕ 2 ϕ 3 ϕ 4 print(armle$var.pred AR(1) 0.7829 - - - *(n+ord)/(n-ord)) AR(2) 1.0047-0.2920 - - # FPE } AR(3) 1.0201-0.3479 0.0578 - AR(4) 1.0596-0.4450 0.0960 0.0037 19 novembre 2014 9 / 15

Diagnostics of selected model Residuals vs. time ACF Residuals Residuals -1.5-1.0-0.5 0.0 0.5 1.0 1.5 ACF -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1880 1900 1920 1940 1960 Time 0 5 10 15 Lag It seems ok. 19 novembre 2014 10 / 15

Diagnostics of AR(1) For comparison, residuals of AR(1) Residuals vs. time ACF Residuals Residuals -2-1 0 1 2 ACF -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1880 1900 1920 1940 1960 Time 0 5 10 15 Lag 19 novembre 2014 11 / 15

Akaike criterion: Kullback-Leibler discrepancy Given a family of probability densities {f ( ; ψ), ψ Ψ}, Kullback-Leibler s index of f ( ; ψ) relative to f ( ; ϑ) is (ψ ϑ) = E ϑ ( 2 log(f (X ; ψ))) = 2 log(f (x; ψ))f (x; ϑ) dx. R n 19 novembre 2014 12 / 15

Akaike criterion: Kullback-Leibler discrepancy Given a family of probability densities {f ( ; ψ), ψ Ψ}, Kullback-Leibler s index of f ( ; ψ) relative to f ( ; ϑ) is (ψ ϑ) = E ϑ ( 2 log(f (X ; ψ))) = 2 log(f (x; ψ))f (x; ϑ) dx. R n Kullback-Leibler s discrepancy between f ( ; ψ) and f ( ; ϑ) is ( ) f (x; ψ) d(ψ ϑ) = (ψ ϑ) (ϑ ϑ) = 2 log f (x; ϑ) dx. R n f (x; ϑ) 19 novembre 2014 12 / 15

Akaike criterion: Kullback-Leibler discrepancy Given a family of probability densities {f ( ; ψ), ψ Ψ}, Kullback-Leibler s index of f ( ; ψ) relative to f ( ; ϑ) is (ψ ϑ) = E ϑ ( 2 log(f (X ; ψ))) = 2 log(f (x; ψ))f (x; ϑ) dx. R n Kullback-Leibler s discrepancy between f ( ; ψ) and f ( ; ϑ) is ( ) f (x; ψ) d(ψ ϑ) = (ψ ϑ) (ϑ ϑ) = 2 log f (x; ϑ) dx. R n f (x; ϑ) Jensen s inequality implies E(log(Y )) log(e(y )) for any random variable. 19 novembre 2014 12 / 15

Akaike criterion: Kullback-Leibler discrepancy Given a family of probability densities {f ( ; ψ), ψ Ψ}, Kullback-Leibler s index of f ( ; ψ) relative to f ( ; ϑ) is (ψ ϑ) = E ϑ ( 2 log(f (X ; ψ))) = 2 log(f (x; ψ))f (x; ϑ) dx. R n Kullback-Leibler s discrepancy between f ( ; ψ) and f ( ; ϑ) is ( ) f (x; ψ) d(ψ ϑ) = (ψ ϑ) (ϑ ϑ) = 2 log f (x; ϑ) dx. R n f (x; ϑ) Jensen s inequality implies E(log(Y )) log(e(y )) for any random variable. Hence ( ) f (x; ψ) d(ψ ϑ) 2 log f (x; ϑ) dx = 0 R n f (x; ϑ) with equality only if f (x; ψ) = f (x; ϑ) a.e. [f ( ; ϑ)]. 19 novembre 2014 12 / 15

Approximating Kullback-Leibler discrepancy Given observations X 1,..., X n, we would like to minimize d(ψ ϑ) among all candidate models ψ, given the true model ϑ. 19 novembre 2014 13 / 15

Approximating Kullback-Leibler discrepancy Given observations X 1,..., X n, we would like to minimize d(ψ ϑ) among all candidate models ψ, given the true model ϑ. As the true model is unknown, we estimate d(ψ ϑ). 19 novembre 2014 13 / 15

Approximating Kullback-Leibler discrepancy Given observations X 1,..., X n, we would like to minimize d(ψ ϑ) among all candidate models ψ, given the true model ϑ. As the true model is unknown, we estimate d(ψ ϑ). Let ψ = (φ, ϑ, σ 2 ) the parameters of an ARMA(p,q) model and ˆψ the MLE based on X 1,..., X n. Let Y an independent realization of the same process. Then 2 log L Y ( ˆφ, ˆϑ, ˆσ 2 ) = n log(2π) + n log( ˆσ 2 ) + log(r 0... r n 1 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 19 novembre 2014 13 / 15

Approximating Kullback-Leibler discrepancy Given observations X 1,..., X n, we would like to minimize d(ψ ϑ) among all candidate models ψ, given the true model ϑ. As the true model is unknown, we estimate d(ψ ϑ). Let ψ = (φ, ϑ, σ 2 ) the parameters of an ARMA(p,q) model and ˆψ the MLE based on X 1,..., X n. Let Y an independent realization of the same process. Then 2 log L Y ( ˆφ, ˆϑ, ˆσ 2 ) = n log(2π) + n log( ˆσ 2 ) + log(r 0... r n 1 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 = 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 S X ( ˆφ, ˆϑ) ˆσ 2 19 novembre 2014 13 / 15

Approximating Kullback-Leibler discrepancy Given observations X 1,..., X n, we would like to minimize d(ψ ϑ) among all candidate models ψ, given the true model ϑ. As the true model is unknown, we estimate d(ψ ϑ). Let ψ = (φ, ϑ, σ 2 ) the parameters of an ARMA(p,q) model and ˆψ the MLE based on X 1,..., X n. Let Y an independent realization of the same process. Then 2 log L Y ( ˆφ, ˆϑ, ˆσ 2 ) = n log(2π) + n log( ˆσ 2 ) + log(r 0... r n 1 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 = 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 S X ( ˆφ, ˆϑ) ˆσ 2 = 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 n = 19 novembre 2014 13 / 15

Approximating Kullback-Leibler discrepancy Given observations X 1,..., X n, we would like to minimize d(ψ ϑ) among all candidate models ψ, given the true model ϑ. As the true model is unknown, we estimate d(ψ ϑ). Let ψ = (φ, ϑ, σ 2 ) the parameters of an ARMA(p,q) model and ˆψ the MLE based on X 1,..., X n. Let Y an independent realization of the same process. Then 2 log L Y ( ˆφ, ˆϑ, ˆσ 2 ) = n log(2π) + n log( ˆσ 2 ) + log(r 0... r n 1 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 = 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 S X ( ˆφ, ˆϑ) ˆσ 2 = 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + S Y ( ˆφ, ˆϑ) n = ˆσ 2 ( ) E ϑ ( ( ˆψ ϑ)) = E (φ,ϑ,σ 2 )( 2 log L X ( ˆφ, ˆϑ, ˆσ 2 S Y ( ˆφ, ˆϑ) )) + E (φ,ϑ,σ 2 ) n. ˆσ 2 19 novembre 2014 13 / 15

Kullback-Leibler discrepancy and AICC Using linear approximations, and asymptotic distributions of estimators, one arrives at ( ) S Y ( ˆφ, ˆϑ) σ 2 (n + p + q). E (φ,ϑ,σ 2 ) Similarly n ˆσ 2 = S X ( ˆφ, ˆϑ) for large n is distributed as σ 2 χ 2 (n p q 2) and is asymptotically independent of ( ˆφ, ˆϑ). 19 novembre 2014 14 / 15

Kullback-Leibler discrepancy and AICC Using linear approximations, and asymptotic distributions of estimators, one arrives at ( ) S Y ( ˆφ, ˆϑ) σ 2 (n + p + q). E (φ,ϑ,σ 2 ) Similarly n ˆσ 2 = S X ( ˆφ, ˆϑ) for large n is distributed as σ 2 χ 2 (n p q 2) and is asymptotically independent of ( ˆφ, ˆϑ). Hence ( ) S Y ( ˆφ, ˆϑ) σ 2 (n + p + q) E (φ,ϑ,σ 2 ) σ 2 σ 2 (n p q 2)/n 19 novembre 2014 14 / 15

Kullback-Leibler discrepancy and AICC Using linear approximations, and asymptotic distributions of estimators, one arrives at ( ) S Y ( ˆφ, ˆϑ) σ 2 (n + p + q). E (φ,ϑ,σ 2 ) Similarly n ˆσ 2 = S X ( ˆφ, ˆϑ) for large n is distributed as σ 2 χ 2 (n p q 2) and is asymptotically independent of ( ˆφ, ˆϑ). Hence ( ) S Y ( ˆφ, ˆϑ) σ 2 (n + p + q) E (φ,ϑ,σ 2 ) σ 2 σ 2 (n p q 2)/n From E ϑ ( ( ˆψ ϑ)) = E (φ,ϑ,σ 2 )( 2 log L X ( ˆφ, ˆϑ, ˆσ ( ) 2 )) + E SY ( ˆφ, ˆϑ) (φ,ϑ,σ 2 ) n σ 2 AICC = 2 log L X ( ˆφ, ˆϑ, ˆσ 2 2(p + q + 1)n ) + n p q is an approximate unbiased estimate of ( ˆϑ ϑ). 19 novembre 2014 14 / 15

Criteria for model choice The order is chosen by minimizing the value of AICC (Corrected Akaike s Information Criterion): 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + 2(p+q+1)n n p q. The second term can be considered a penalty for models with a large number of parameters. 19 novembre 2014 15 / 15

Criteria for model choice The order is chosen by minimizing the value of AICC (Corrected Akaike s Information Criterion): 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + 2(p+q+1)n n p q. The second term can be considered a penalty for models with a large number of parameters. For n large it is approximately the same as Akaike s information Criterion (AIC): 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + 2(p + q + 1), but carries a higher penalty for finite n, and thus is somewhat less likely to overfit. 19 novembre 2014 15 / 15

Criteria for model choice The order is chosen by minimizing the value of AICC (Corrected Akaike s Information Criterion): 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + 2(p+q+1)n n p q. The second term can be considered a penalty for models with a large number of parameters. For n large it is approximately the same as Akaike s information Criterion (AIC): 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + 2(p + q + 1), but carries a higher penalty for finite n, and thus is somewhat less likely to overfit. A rule of thumb is the fits of model 1 and model 2 are not significantly different if AICC 1 AICC 2 < 2 (only the difference matters, not the absolute value of AICC). Hence, we may decide to choose model 1 if it simpler than 2 (or its residuals are closer to white-noise) even if AICC 1 > AICC 2 as long as AICC 1 < AICC 2 + 2. 19 novembre 2014 15 / 15