Asymptotic distribution of MLE

Asymptotic distribution of MLE Theorem Let {X t } be a causal and invertible ARMA(p,q) process satisfying Φ(B)X = Θ(B)Z, {Z t } IID(0, σ 2 ). Let ( ˆφ, ˆϑ) the values that minimize LL n (φ, ϑ) among those yielding a causal and invertible ARMA process 19 novembre 2014 1 / 15

Asymptotic distribution of MLE Theorem Let {X t } be a causal and invertible ARMA(p,q) process satisfying Φ(B)X = Θ(B)Z, {Z t } IID(0, σ 2 ). Let ( ˆφ, ˆϑ) the values that minimize LL n (φ, ϑ) among those yielding a causal and invertible ARMA process, and let ˆσ 2 = S( ˆφ, ˆϑ). Then n n 1/2 (( ˆφ, ˆϑ) (φ, ϑ)) = N(0, W ) and ˆσ 2 a.s. σ 2 19 novembre 2014 1 / 15

Asymptotic distribution of MLE Theorem Let {X t } be a causal and invertible ARMA(p,q) process satisfying Φ(B)X = Θ(B)Z, {Z t } IID(0, σ 2 ). Let ( ˆφ, ˆϑ) the values that minimize LL n (φ, ϑ) among those yielding a causal and invertible ARMA process, and let ˆσ 2 = S( ˆφ, ˆϑ). Then n n 1/2 (( ˆφ, ˆϑ) (φ, ϑ)) = N(0, W ) and ˆσ 2 a.s. σ 2 ( W = σ 2 E(Ut U t t) E(U t V t ) 1 U t t) E(V t U t t) E(V t V t with U t =. V t =. t) U t p+1 (Φ(B)U) t = Z t (Θ(B)V ) t = Z t. V t V t q+1 19 novembre 2014 1 / 15

Asymptotic distribution of MLE: examples {X t } AR(p) Then W = σ 2 (E(U t U t t)) 1 = σ 2 Γ 1 p. 19 novembre 2014 2 / 15

Asymptotic distribution of MLE: examples {X t } AR(p) Then W = σ 2 (E(U t U t t)) 1 = σ 2 Γ 1 p. Hence ˆφ N(φ, σ2 n Γ 1 p ) for n large. For p = 1, ˆϕ N(ϕ, 1 n (1 ϕ2 )). 19 novembre 2014 2 / 15

Asymptotic distribution of MLE: examples {X t } AR(p) Then W = σ 2 (E(U t U t t)) 1 = σ 2 Γ 1 p. Hence ˆφ N(φ, σ2 n Γ 1 p ) for n large. For p = 1, ˆϕ N(ϕ, 1 n (1 ϕ2 )). {X t } MA(q) Then W = σ 2 (E(V t V t t)) 1 = σ 2 (Γ q) 1 where Γ q is the covariance of the AR(q) process (Θ(B)V ) t = Z t. 19 novembre 2014 2 / 15

Asymptotic distribution of MLE: examples {X t } AR(p) Then W = σ 2 (E(U t U t t)) 1 = σ 2 Γ 1 p. Hence ˆφ N(φ, σ2 n Γ 1 p ) for n large. For p = 1, ˆϕ N(ϕ, 1 n (1 ϕ2 )). {X t } MA(q) Then W = σ 2 (E(V t V t t)) 1 = σ 2 (Γ q) 1 where Γ q is the covariance of the AR(q) process (Θ(B)V ) t = Z t. For q = 1, ˆϑ N(ϑ, 1 n (1 ϑ2 )). 19 novembre 2014 2 / 15

Asymptotic distribution of MLE: examples {X t } AR(p) Then W = σ 2 (E(U t U t t)) 1 = σ 2 Γ 1 p. Hence ˆφ N(φ, σ2 n Γ 1 p ) for n large. For p = 1, ˆϕ N(ϕ, 1 n (1 ϕ2 )). {X t } MA(q) Then W = σ 2 (E(V t V t t)) 1 = σ 2 (Γ q) 1 where Γ q is the covariance of the AR(q) process (Θ(B)V ) t = Z t. For q = 1, ˆϑ N(ϑ, 1 n (1 ϑ2 )). ( ) E(U {X t } ARMA(1, 1) W = σ 2 2 1 t ) E(U t V t ) E(U t V t E(Vt 2 = ) ( (1 ϕ 2 ) 1 (1 + ϕϑ) 1 ) 1 (1 + ϕϑ) 1 (1 ϑ 2 ) 1. 19 novembre 2014 2 / 15

Asymptotic distribution of MLE: examples {X t } AR(p) Then W = σ 2 (E(U t U t t)) 1 = σ 2 Γ 1 p. Hence ˆφ N(φ, σ2 n Γ 1 p ) for n large. For p = 1, ˆϕ N(ϕ, 1 n (1 ϕ2 )). {X t } MA(q) Then W = σ 2 (E(V t V t t)) 1 = σ 2 (Γ q) 1 where Γ q is the covariance of the AR(q) process (Θ(B)V ) t = Z t. For q = 1, ˆϑ N(ϑ, 1 n (1 ϑ2 )). ( ) E(U {X t } ARMA(1, 1) W = σ 2 2 1 t ) E(U t V t ) E(U t V t E(Vt 2 = ) ( (1 ϕ 2 ) 1 (1 + ϕϑ) 1 ) 1 (1 + ϕϑ) 1 (1 ϑ 2 ) 1. One easily obtains the asymptotic variance of ( ˆφ, ˆϑ). 19 novembre 2014 2 / 15

Comparison of estimators Estimator Method of Fitted Parameter moments innovations MLE AR(1) ϕ ˆρ(1) ˆϑ m,1 MA(1) ϑ 1 1 4(ˆρ(1)) 2 2ˆρ(1) ˆϑ m,1 ARMA(1,1) ϕ ˆρ(2)/ˆρ(1) ˆϑ m,2 / ˆϑ m,1 ϑ ugly expression ˆϑ m,1 ˆϑ m,2 / ˆϑ m,1 if ˆρ(1) 1. 19 novembre 2014 3 / 15

Comparison of estimators Asymptotic variance Method of Fitted Par. moments innovations MLE (1) (2) (3) 1 AR(1) ϕ n (1 ϕ2 1 1 ) n n (1 ϕ2 ) 1 1+ϑ MA(1) ϑ +4ϑ 4 +ϑ 6 +ϑ 8 1 1 n (1 ϑ 2 ) 2 n n (1 ϑ2 ) 1 ARMA(1,1) ϕ n ϑ 1 n (1+ϕϑ) 2 (1 ϕ 2 ) (ϕ+ϑ) 2 (1+ϕϑ) 2 (1 ϑ 2 ) (ϕ+ϑ) 2 19 novembre 2014 4 / 15

Comparison of estimators Asymptotic variance Method of Fitted Par. moments innovations MLE (1) (2) (3) 1 AR(1) ϕ n (1 ϕ2 1 1 ) n n (1 ϕ2 ) 1 1+ϑ MA(1) ϑ +4ϑ 4 +ϑ 6 +ϑ 8 1 1 n (1 ϑ 2 ) 2 n n (1 ϑ2 ) 1 ARMA(1,1) ϕ n ϑ 1 n (1+ϕϑ) 2 (1 ϕ 2 ) (ϕ+ϑ) 2 (1+ϕϑ) 2 (1 ϑ 2 ) (ϕ+ϑ) 2 Relative asymptotic efficiency e(ϑ) of (asymptotically unbiased) estimators of the parameter ϑ: ratio of asymptotic variances. 0.82 ϑ = 0.25 0.94 ϑ = 0.25 e(ϑ; 1, 2) = 0.37 ϑ = 0.5 e(ϑ; 2, 3) = 0.75 ϑ = 0.5 0.06 ϑ = 0.75 0.44 ϑ = 0.75 19 novembre 2014 4 / 15

A tool to compute asymptotic variances Theorem Let σ 2 n 0 and X n µ σ n = N(0, 1). Then g(x n ) g(µ) σ n = N(0, (g (µ) 2 ) i.e. g(x n ) N(g(µ), (g (µ)) 2 σ 2 n). Let X n k-dim., g : R k R m and X n µ σ n = N(0, V ). Let (DVD) ii > 0 where D ij = g i x j (µ). Then g(x n ) g(µ) σ n = N(0, DVD t ) i.e. g(x n ) N(g(µ), DVD t σ 2 n). 19 novembre 2014 5 / 15

Model choice: introduction MLE provides estimates for any given model, e.g. ARMA(p,q). How do we choose? The residuals should resemble a white noise. Residuals can be defined as Ŵ t = (X t ˆX t ( ˆφ, ˆϑ))((r t 1 ( ˆφ, ˆϑ)) 1/2. (X t ˆX t (φ, ϑ))((r t 1 (φ, ϑ)) 1/2 is a white-noise sequence, and Ŵ t should be close. This can be tested, e.g. by computing the ACF of {Ŵt}. To avoid overfitting, the order can be selected through a criterion. 19 novembre 2014 6 / 15

Model choice: FPE criterion FPE (Final Prediction Error) is an estimate of the one-step prediction error (in L 2 norm) for an independent realization of the observed process. Assume (X 1,..., X n ) a realization of a causal AR(p) process with coefficients ϕ 1,..., ϕ n and (Y 1,..., Y n ) an independent realization of the same. 19 novembre 2014 7 / 15

Model choice: FPE criterion FPE (Final Prediction Error) is an estimate of the one-step prediction error (in L 2 norm) for an independent realization of the observed process. Assume (X 1,..., X n ) a realization of a causal AR(p) process with coefficients ϕ 1,..., ϕ n and (Y 1,..., Y n ) an independent realization of the same. The mean-square prediction error is FPE = E(Y n+1 ˆϕ 1 Y n ˆϕ n Y n+1 p ) 2 19 novembre 2014 7 / 15

Model choice: FPE criterion FPE (Final Prediction Error) is an estimate of the one-step prediction error (in L 2 norm) for an independent realization of the observed process. Assume (X 1,..., X n ) a realization of a causal AR(p) process with coefficients ϕ 1,..., ϕ n and (Y 1,..., Y n ) an independent realization of the same. The mean-square prediction error is FPE = E(Y n+1 ˆϕ 1 Y n ˆϕ n Y n+1 p ) 2 p p = E Y n+1 ϕ j Y n+1 j ( ˆϕ j ϕ j )Y n+1 j j=1 j=1 2 19 novembre 2014 7 / 15

Model choice: FPE criterion FPE (Final Prediction Error) is an estimate of the one-step prediction error (in L 2 norm) for an independent realization of the observed process. Assume (X 1,..., X n ) a realization of a causal AR(p) process with coefficients ϕ 1,..., ϕ n and (Y 1,..., Y n ) an independent realization of the same. The mean-square prediction error is FPE = E(Y n+1 ˆϕ 1 Y n ˆϕ n Y n+1 p ) 2 p p = E Y n+1 ϕ j Y n+1 j ( ˆϕ j ϕ j )Y n+1 j j=1 Y n+1 p j=1 ϕ jy n+1 j = Z n+1 is independent of other terms, and has variance σ 2. Furthermore, ˆφ φ is independent of {Y t }. j=1 2 19 novembre 2014 7 / 15

Model choice: FPE criterion FPE (Final Prediction Error) is an estimate of the one-step prediction error (in L 2 norm) for an independent realization of the observed process. Assume (X 1,..., X n ) a realization of a causal AR(p) process with coefficients ϕ 1,..., ϕ n and (Y 1,..., Y n ) an independent realization of the same. The mean-square prediction error is FPE = E(Y n+1 ˆϕ 1 Y n ˆϕ n Y n+1 p ) 2 p p = E Y n+1 ϕ j Y n+1 j ( ˆϕ j ϕ j )Y n+1 j j=1 Y n+1 p j=1 ϕ jy n+1 j = Z n+1 is independent of other terms, and has variance σ 2. Furthermore, ˆφ φ is independent of {Y t }. Hence p FPE = σ 2 + E(( ˆϕ j ϕ j )( ˆϕ i ϕ i ))E(Y n+1 j Y n+1 i ) i,j=1 j=1 = σ 2 + E( Γ p ( ˆφ φ), ˆφ φ ). 2 19 novembre 2014 7 / 15

FPE criterion: estimation FPE = σ 2 + E( Γ p ( ˆφ φ), ˆφ φ ). 19 novembre 2014 8 / 15

FPE criterion: estimation FPE = σ 2 + E( Γ p ( ˆφ φ), ˆφ φ ). Fact: if X is an n-dimensional random vector with V(X ) = S and A is an n n matrix, then E( AX, X ) = tr(as). 19 novembre 2014 8 / 15

FPE criterion: estimation FPE = σ 2 + E( Γ p ( ˆφ φ), ˆφ φ ). Fact: if X is an n-dimensional random vector with V(X ) = S and A is an n n matrix, then E( AX, X ) = tr(as). Furthermore, it was stated V( ˆφ φ) σ2 n Γ 1 p ) for n large. 19 novembre 2014 8 / 15

FPE criterion: estimation FPE = σ 2 + E( Γ p ( ˆφ φ), ˆφ φ ). Fact: if X is an n-dimensional random vector with V(X ) = S and A is an n n matrix, then E( AX, X ) = tr(as). Furthermore, it was stated V( ˆφ φ) σ2 tr(γ p Γ 1 p ) = p, FPE σ 2 (1 + p n ). n Γ 1 p ) for n large. As 19 novembre 2014 8 / 15

FPE criterion: estimation FPE = σ 2 + E( Γ p ( ˆφ φ), ˆφ φ ). Fact: if X is an n-dimensional random vector with V(X ) = S and A is an n n matrix, then E( AX, X ) = tr(as). Furthermore, it was stated V( ˆφ φ) σ2 n Γ 1 p ) for n large. As tr(γ p Γ 1 p ) = p, FPE σ 2 (1 + p n ). Replacing σ 2 n by the estimator ˆσ n p 2, one finally obtains the quantity that should be minimized. ( ) n + p ˆσ 2 n p 19 novembre 2014 8 / 15

FPE criterion: estimation FPE = σ 2 + E( Γ p ( ˆφ φ), ˆφ φ ). Fact: if X is an n-dimensional random vector with V(X ) = S and A is an n n matrix, then E( AX, X ) = tr(as). Furthermore, it was stated V( ˆφ φ) σ2 n Γ 1 p ) for n large. As tr(γ p Γ 1 p ) = p, FPE σ 2 (1 + p n ). Replacing σ 2 n by the estimator ˆσ n p 2, one finally obtains the quantity ( ) n + p ˆσ 2 n p that should be minimized. Increasing p will generally decrease ˆσ 2, but will be penalized by the other factor. 19 novembre 2014 8 / 15

Use of FPE on lake data for (ord in 1:4) Model ˆσ2 FPE { armle = AR(1) 0.4972 0.5075 ar.mle(huron2, AR(2) 0.4571 0.4762 order=ord,aic=f) AR(3) 0.4557 0.4845 print(armle) AR(4) 0.4573 0.4962 #coefficients and sigma Model ϕ 1 ϕ 2 ϕ 3 ϕ 4 print(armle$var.pred AR(1) 0.7829 - - - *(n+ord)/(n-ord)) AR(2) 1.0047-0.2920 - - # FPE } AR(3) 1.0201-0.3479 0.0578 - AR(4) 1.0596-0.4450 0.0960 0.0037 19 novembre 2014 9 / 15

Diagnostics of selected model Residuals vs. time ACF Residuals Residuals -1.5-1.0-0.5 0.0 0.5 1.0 1.5 ACF -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1880 1900 1920 1940 1960 Time 0 5 10 15 Lag It seems ok. 19 novembre 2014 10 / 15

Diagnostics of AR(1) For comparison, residuals of AR(1) Residuals vs. time ACF Residuals Residuals -2-1 0 1 2 ACF -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1880 1900 1920 1940 1960 Time 0 5 10 15 Lag 19 novembre 2014 11 / 15

Akaike criterion: Kullback-Leibler discrepancy Given a family of probability densities {f ( ; ψ), ψ Ψ}, Kullback-Leibler s index of f ( ; ψ) relative to f ( ; ϑ) is (ψ ϑ) = E ϑ ( 2 log(f (X ; ψ))) = 2 log(f (x; ψ))f (x; ϑ) dx. R n 19 novembre 2014 12 / 15

Akaike criterion: Kullback-Leibler discrepancy Given a family of probability densities {f ( ; ψ), ψ Ψ}, Kullback-Leibler s index of f ( ; ψ) relative to f ( ; ϑ) is (ψ ϑ) = E ϑ ( 2 log(f (X ; ψ))) = 2 log(f (x; ψ))f (x; ϑ) dx. R n Kullback-Leibler s discrepancy between f ( ; ψ) and f ( ; ϑ) is ( ) f (x; ψ) d(ψ ϑ) = (ψ ϑ) (ϑ ϑ) = 2 log f (x; ϑ) dx. R n f (x; ϑ) 19 novembre 2014 12 / 15

Akaike criterion: Kullback-Leibler discrepancy Given a family of probability densities {f ( ; ψ), ψ Ψ}, Kullback-Leibler s index of f ( ; ψ) relative to f ( ; ϑ) is (ψ ϑ) = E ϑ ( 2 log(f (X ; ψ))) = 2 log(f (x; ψ))f (x; ϑ) dx. R n Kullback-Leibler s discrepancy between f ( ; ψ) and f ( ; ϑ) is ( ) f (x; ψ) d(ψ ϑ) = (ψ ϑ) (ϑ ϑ) = 2 log f (x; ϑ) dx. R n f (x; ϑ) Jensen s inequality implies E(log(Y )) log(e(y )) for any random variable. 19 novembre 2014 12 / 15

Akaike criterion: Kullback-Leibler discrepancy Given a family of probability densities {f ( ; ψ), ψ Ψ}, Kullback-Leibler s index of f ( ; ψ) relative to f ( ; ϑ) is (ψ ϑ) = E ϑ ( 2 log(f (X ; ψ))) = 2 log(f (x; ψ))f (x; ϑ) dx. R n Kullback-Leibler s discrepancy between f ( ; ψ) and f ( ; ϑ) is ( ) f (x; ψ) d(ψ ϑ) = (ψ ϑ) (ϑ ϑ) = 2 log f (x; ϑ) dx. R n f (x; ϑ) Jensen s inequality implies E(log(Y )) log(e(y )) for any random variable. Hence ( ) f (x; ψ) d(ψ ϑ) 2 log f (x; ϑ) dx = 0 R n f (x; ϑ) with equality only if f (x; ψ) = f (x; ϑ) a.e. [f ( ; ϑ)]. 19 novembre 2014 12 / 15

Approximating Kullback-Leibler discrepancy Given observations X 1,..., X n, we would like to minimize d(ψ ϑ) among all candidate models ψ, given the true model ϑ. 19 novembre 2014 13 / 15

Approximating Kullback-Leibler discrepancy Given observations X 1,..., X n, we would like to minimize d(ψ ϑ) among all candidate models ψ, given the true model ϑ. As the true model is unknown, we estimate d(ψ ϑ). 19 novembre 2014 13 / 15

Approximating Kullback-Leibler discrepancy Given observations X 1,..., X n, we would like to minimize d(ψ ϑ) among all candidate models ψ, given the true model ϑ. As the true model is unknown, we estimate d(ψ ϑ). Let ψ = (φ, ϑ, σ 2 ) the parameters of an ARMA(p,q) model and ˆψ the MLE based on X 1,..., X n. Let Y an independent realization of the same process. Then 2 log L Y ( ˆφ, ˆϑ, ˆσ 2 ) = n log(2π) + n log( ˆσ 2 ) + log(r 0... r n 1 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 19 novembre 2014 13 / 15

Approximating Kullback-Leibler discrepancy Given observations X 1,..., X n, we would like to minimize d(ψ ϑ) among all candidate models ψ, given the true model ϑ. As the true model is unknown, we estimate d(ψ ϑ). Let ψ = (φ, ϑ, σ 2 ) the parameters of an ARMA(p,q) model and ˆψ the MLE based on X 1,..., X n. Let Y an independent realization of the same process. Then 2 log L Y ( ˆφ, ˆϑ, ˆσ 2 ) = n log(2π) + n log( ˆσ 2 ) + log(r 0... r n 1 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 = 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 S X ( ˆφ, ˆϑ) ˆσ 2 19 novembre 2014 13 / 15

Approximating Kullback-Leibler discrepancy Given observations X 1,..., X n, we would like to minimize d(ψ ϑ) among all candidate models ψ, given the true model ϑ. As the true model is unknown, we estimate d(ψ ϑ). Let ψ = (φ, ϑ, σ 2 ) the parameters of an ARMA(p,q) model and ˆψ the MLE based on X 1,..., X n. Let Y an independent realization of the same process. Then 2 log L Y ( ˆφ, ˆϑ, ˆσ 2 ) = n log(2π) + n log( ˆσ 2 ) + log(r 0... r n 1 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 = 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 S X ( ˆφ, ˆϑ) ˆσ 2 = 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 n = 19 novembre 2014 13 / 15

Approximating Kullback-Leibler discrepancy Given observations X 1,..., X n, we would like to minimize d(ψ ϑ) among all candidate models ψ, given the true model ϑ. As the true model is unknown, we estimate d(ψ ϑ). Let ψ = (φ, ϑ, σ 2 ) the parameters of an ARMA(p,q) model and ˆψ the MLE based on X 1,..., X n. Let Y an independent realization of the same process. Then 2 log L Y ( ˆφ, ˆϑ, ˆσ 2 ) = n log(2π) + n log( ˆσ 2 ) + log(r 0... r n 1 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 = 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 S X ( ˆφ, ˆϑ) ˆσ 2 = 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + S Y ( ˆφ, ˆϑ) n = ˆσ 2 ( ) E ϑ ( ( ˆψ ϑ)) = E (φ,ϑ,σ 2 )( 2 log L X ( ˆφ, ˆϑ, ˆσ 2 S Y ( ˆφ, ˆϑ) )) + E (φ,ϑ,σ 2 ) n. ˆσ 2 19 novembre 2014 13 / 15

Kullback-Leibler discrepancy and AICC Using linear approximations, and asymptotic distributions of estimators, one arrives at ( ) S Y ( ˆφ, ˆϑ) σ 2 (n + p + q). E (φ,ϑ,σ 2 ) Similarly n ˆσ 2 = S X ( ˆφ, ˆϑ) for large n is distributed as σ 2 χ 2 (n p q 2) and is asymptotically independent of ( ˆφ, ˆϑ). 19 novembre 2014 14 / 15

Kullback-Leibler discrepancy and AICC Using linear approximations, and asymptotic distributions of estimators, one arrives at ( ) S Y ( ˆφ, ˆϑ) σ 2 (n + p + q). E (φ,ϑ,σ 2 ) Similarly n ˆσ 2 = S X ( ˆφ, ˆϑ) for large n is distributed as σ 2 χ 2 (n p q 2) and is asymptotically independent of ( ˆφ, ˆϑ). Hence ( ) S Y ( ˆφ, ˆϑ) σ 2 (n + p + q) E (φ,ϑ,σ 2 ) σ 2 σ 2 (n p q 2)/n 19 novembre 2014 14 / 15

Kullback-Leibler discrepancy and AICC Using linear approximations, and asymptotic distributions of estimators, one arrives at ( ) S Y ( ˆφ, ˆϑ) σ 2 (n + p + q). E (φ,ϑ,σ 2 ) Similarly n ˆσ 2 = S X ( ˆφ, ˆϑ) for large n is distributed as σ 2 χ 2 (n p q 2) and is asymptotically independent of ( ˆφ, ˆϑ). Hence ( ) S Y ( ˆφ, ˆϑ) σ 2 (n + p + q) E (φ,ϑ,σ 2 ) σ 2 σ 2 (n p q 2)/n From E ϑ ( ( ˆψ ϑ)) = E (φ,ϑ,σ 2 )( 2 log L X ( ˆφ, ˆϑ, ˆσ ( ) 2 )) + E SY ( ˆφ, ˆϑ) (φ,ϑ,σ 2 ) n σ 2 AICC = 2 log L X ( ˆφ, ˆϑ, ˆσ 2 2(p + q + 1)n ) + n p q is an approximate unbiased estimate of ( ˆϑ ϑ). 19 novembre 2014 14 / 15

Criteria for model choice The order is chosen by minimizing the value of AICC (Corrected Akaike s Information Criterion): 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + 2(p+q+1)n n p q. The second term can be considered a penalty for models with a large number of parameters. 19 novembre 2014 15 / 15

Criteria for model choice The order is chosen by minimizing the value of AICC (Corrected Akaike s Information Criterion): 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + 2(p+q+1)n n p q. The second term can be considered a penalty for models with a large number of parameters. For n large it is approximately the same as Akaike s information Criterion (AIC): 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + 2(p + q + 1), but carries a higher penalty for finite n, and thus is somewhat less likely to overfit. 19 novembre 2014 15 / 15

Criteria for model choice The order is chosen by minimizing the value of AICC (Corrected Akaike s Information Criterion): 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + 2(p+q+1)n n p q. The second term can be considered a penalty for models with a large number of parameters. For n large it is approximately the same as Akaike s information Criterion (AIC): 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + 2(p + q + 1), but carries a higher penalty for finite n, and thus is somewhat less likely to overfit. A rule of thumb is the fits of model 1 and model 2 are not significantly different if AICC 1 AICC 2 < 2 (only the difference matters, not the absolute value of AICC). Hence, we may decide to choose model 1 if it simpler than 2 (or its residuals are closer to white-noise) even if AICC 1 > AICC 2 as long as AICC 1 < AICC 2 + 2. 19 novembre 2014 15 / 15