Inference in the Skew-t and Related Distributions

Inference in the Skew-t and Related Distributions Anna Clara Monti Pe.Me.Is Department, Statistical division University of Sannio Einaudi Institute for Economics and Finance Roma 8 maggio 008

Outline Brief overview of flexible models The skew-t model Problems connected to inference Reparameterizations Applications in robustness context

The basic lemma A continuous density function symmetric around 0: A skewing function [ ] d π : 0,1 ( ξ ) π ( ξ) f( y) = g y y g( y) is a proper density function. Azzalini (1985) and further literature Often ( y) G( w( y) ) π = ( ) is a scalar distribution function such that ( ) = 1 ( ) ( ): d is such that ( ) = ( ) G y G y G y w y w y w y

SN The Skew Normal distribution T 1 ( ; ξ, Ω, α) = ϕ ( ξ; Ω) Φ α ω ( ξ) d (, Ω, ) Y SN ξ α ( ) f y y y Azzalini (1985), Azzalini & Dalla Valle (1996) 1/ (,, 1/ 11 dd ) ω = diag ω ω When (, Ω, ) Y SN ξ α E( Y) = ξ + ωµ z T ( ) =Ω Var Z ωµ µω z z µ = bδ Z 1 δ = Ωα ( 1 + α' Ωα) 1/ b = π 1/ Ω= ω Ωω 1 1 In the scalar case ( bδ ) ( bδ ) 1 γ1 = ( 4 π) 1.995 < γ <.995 1 3/ ( δ ) ( bδ ) b γ = ( π 3) 1 δ = α ( 1+ α ) 1/

Stocastic representations Conditioning method Transformation method U 1 δ 0 N ( * ) * d + 1 0, Ω Ω = U δ Ω 0 ( ) U U > 0 SN 0, Ω, α U 1 0 U 0 Ψ 1 α = Ω 1 ( 1 δ ' Ω δ) ' 0 ' 1/ ' N 1 0, 0 ( 1 ) ' d+ Z j = δ j U + δ j U j ( Z Zd ) SN( Ω α ) 1,, 0,, 1/ 1 1< δ j < 1 δ Maximum (d=) 1/ U0 1 ρ 1 ρ N 0, max ( U0, U1) SN( 0,1, α) α U = 1 ρ 1 1+ ρ

Main properties Marginal distribution (, ) ( 0,, α ) ( 0,, α ) Y = Y Y SN Ω Y SN Ω α 1 1 11 1 α +Ω Ω α 1 1 11 1 1 1 = Ω 1/.1 =Ω Ω1Ω11Ω1 T ( 1+ αω.1α) Linear transformation Quadratic forms (, Ω, ) Y SN ξ α ( 0, Ω, ) Y SN α T A A non-singular ΩA Correlation matrix B symmetric positive semidefinite BΩ B = B rank( B) = p 1 ( 0, Ω T, ) AY SN A A A α T ( Y ξ ) B( Y ξ) χ p Even functions ( ) = ( ) ( ) ( ) ( ) Y f y g y G w y Z g z d ( ) = t( Z) t Y

Skew Normal Inferential issues Inflection point of the profile likelihood when α=0 The information matrix is singular The shape of the likelihood function can be far from quadratic Azzalini & Capitanio, 1999 These problems can be remedied by reparameterization CP = ( µ, Var ( Y ), γ1) Azzalini (1985), Azzalini & Capitanio (1999), Arellano-Valle & Azzalini (007) The estimate of the skewness parameter is often infinite Liseo & Loperfido (004), Sartori (005), Greco (008), Azzalini e Genton (008)

Skew elliptical distribution Purpose: to deal with both skewness and heavy tails c d 1/ { } T 1 ( ; ξ, Ω ) = ( ξ) Ω ( ξ) g y g y y Ω Density function constant on ellipsoids ( ) ( ξ ) ( ) f( y) = g y;, Ω G w y Stocastic representations Conditioning Transformation Maximum (d=) Azzalini & Capitanio (003), Branco & Dey (001)

Skew-t Z SN V χ ν ν ( 0, Ω, α ) Z and V independent 1/ Y V Z St (,,, ) = ξ + ξ Ω α ν T 1 ν + 1 f ( y; ξ, Ω, α, ν) = td ( y; ξ, Ω, ν) T1 α ω ( y ξ) ; ν + d Qy + ν Y T 1 ( ξ ) ( ξ ) Q = Y Ω Y t d ( y; ξων,, ) = ν / d / ν + d Qy Γ 1+ ν 1/ d / ν Ω ( πν ) Γ y ( ; ν) ( ;0,1, ν) T y = t u du

Main properties of the Skew-t Linear transformation Y St( ξ, Ω, αν, ) AY + b St( ξ', AΩA T, α', ν) Marginal distributions ' ( ξ, Ω, αν, ) j ( ξ j, ωjj, α j, ν) Y St Y St Quadratic forms ( ξ, Ω, α, ν) T 1 ( Y ξ) Ω ( Y ξ) Y St F d d, ν Azzalini & Capitanio, 003

Univariate Skew-t distribution Y (,,, ) St ξ ωαν α = 3 ν = 5 ( ; ξωαν,,, ) = ( ; ξων,, ) ( ζν ; + 1) f y t y T -3 - -1 0 1 3 4 ζ = ατ z z = y ξ ω τ ν + 1 ν + z = 1/ ( ; ξων,, ) t y = ν + 1 z Γ 1 + ν 1/ ν ω( πν) Γ ν / 1/ y ( ; ν) ( ;0,1, ν) T y = t u du

Moments of the Skew-t distribution µ = ξ + ωδb ν σ = ω δ b ν ν ν b ν 1/ δ = ( ν / 1/) ( ν /) ν Γ + = π Γ α ( 1+ α ) 1/ ( ) ν 3 δ 3ν ν γ1 = δb + δ b δ b ν 3 ν ν 3/ ν ν ν ( ) 3ν 4δ bν ν 3 δ 6δ bν ν 4 4 ν γ = + 3δ bν δ bν 3 ( ν )( ν 4) ν 3 ν ν

Stew-t Main features Model able to deal with Skewness Heavy tails α = ν Special cases ( ξ ω ν) ( ξωα,, ) 0 Y t,, ( ) α = 0 and ν Y N ξ, ω Y SN Topics of the talk Inferential issues under special cases Applications in robustness contexts

ϑ = Score functions and information matrix ( ξωαν,,, ) ( ϑ) ( y ϑ ) ( ) ( z ) ( ϑ) ln f y; zτ ατν ln f y; 1 z τ ατzν Sξ ( y) = = w S ( y) w ω = = + ξ ω ων+ ω ω ω ων+ ln f ; Sα ( y) = = zτ w α ( z ) z ( 1 τ ) ( + z ) ( ) ln f y; ϑ 1 ν ν ν + 1 z z τ α γ Sν ( y) = = Ψ + 1 Ψ ln 1+ + + w + ν ν ( ν + 1 ) ν ν τ ν T ζ, ν + 1 z = y ξ ω ζ = ατ z τ ν + 1 ν + z = 1/ (, + 1) ( ςν, + 1) t ςν w = T ln Γ Ψ ( u) = u ( u) ( ν ) ζ + u u γ = ln 1 t + ( ν 1)( ν 1 u ) ν + 1 + + + ν + 1 ( ) u du Maximum Likelihood Estimators n Y ˆ ( ˆ ˆ ˆ ˆ) 1, Y,, Yn ϑ = ξ, ω, α, ν : Sϑ ( yi) = 0 j = 1,,4 Information matrix i= 1 j ( y ϑ) { T ( ) ( ) } ln f I E S Y S Y E ; = ϑ ϑ = T ϑ ϑ

Score functions ξ ω ξ = 0 ω = 1 α = 3 ν=3 (blue) ν =5 (green) ν =10 (red) ν =5 (black) 0 - -4-6 -8-10 -5 0 5 10 x 5 0 15 10 5 0-10 -5 0 5 10 x α ν 0 0.0 - -0. -0.4-4 -0.6-0.8-6 -1.0-1. -1.4-10 -5 0 5 10-10 -5 0 5 10 x x

Issues related to likelihood inference Nice likelihood function behaviour No stationary point Quadratic shape (Azzalini and Capitanio 003,Azzalini and Genton, 008) The information matrix can be singular Normality or skew normality parameter on the boundary The estimate of α can be infinite Sartori (005), Azzalini e Genton (008), Greco (008)

Score functions and information matrix under the t distribution zτ 1 z τ S y S y S y z t ω ω ω t 1 ν ν z z 1 Sν ( y) = Ψ + 1 Ψ ln 1+ + ν ν + z t t t ξ ω α ν+ 1 ( ) = ( ) = + ( ) = τ ( 0) t t t t I = I = I = I = 0 ( ˆ, ˆ ) ξω ξν ωα αν ξ α ( ˆ ω, ˆ ν ) are asymptotically uncorrelated t I ξξ ν + 1 = ω ( ν + 3) t I ων = ω ν ν t I νν t I ξα ( + 1)( + 3) 1/ ν Γ ( ν /+ 3/) t ν I π ω ( ν / ) ωω = ω ( ν + 3) t 4 Γ ( ν /+ 1) I αα = π ( ν + 1 ) Γ ( ν /+ 1/) = Γ + 1 ν + 5 = { Ψ1( ν /) Ψ 1( ν /+ 1/) } 4 ν ν 1 3 ( + )( ν + )

Score functions when ν diverges S S SN ξ SN ω ( y) ( y) ( ) SN Sα y z S Skew Normal SN ν ( y) ( z) ( α z) ( z) ( z) z α φ α = ω ω Φ α ( z) ( z) 1 z α z φ α = + ω ω ω Φ α φ α = Φ = 0 ( ) = ( ν ) S y O ν S S N ξ N ω ( y) ( y) N Sα ( y) = z π S N ν Normal ( y) z = ω 1 z = + ω ω = 0 1/ rank( I N )= The information matrix is singular!

Inferential problems when ν diverges Singularity of the information matrix Parameter ν on the boundary Efficiency of the estimators Asymptotic distribution of the estimators Likelihood ratio test

Root mean square errors Root mean square errors of the estimators of the direct parameters ξ, ω, α, e κ=1/ν N ξ, ω when the underlying distribution is ( ) n ξ ω α κ 50 0.836 0.3467 350.7858 0.0667 100 0.7159 0.651 1.505 0.0459 00 0.610 0.054 0.9790 0.038 500 0.540 0.1489 0.758 0.011 1000 0.4584 0.1170 0.6394 0.0157 5000 0.350 0.0705 0.467 0.0073 10000 0.3115 0.0558 0.4083 0.005 Simulations=10,000

Simulated distribution of the estimators under the Normal model n = 10,000 Simulations = 10,000 1.5 1.0 ξ 10 8 6 ω 0.5 0.0 4 0-0.6-0. 0. 0.4 0.6 1.00 1.05 1.10 1.15 1. α 1.0 0.8 0.6 0.4 0. 0.0-0.5 0.0 0.5

Distribution of the likelihood ratio statistic to test normality or skew normality ( ) H : Y N ξ, ω 0 ( ) H0 : Y SN ξ, ωα, The likelihood statistic does not converge in distribution to a chi-square!!!

Reparameterization of the degrees of freedom The singularity at the SN Model can be removed by a reparameterization 1 κ = ν z( z 1) 1/ ( + κ) ( + κz ) ( ) ( ) ( κ ) 1 1 1 1 κ κ + + 1 z Sκ ( y) = Ψ + 1 Ψ ln( 1+ κ z ) + + ϑκ ( ) ( ζ ) κ T ( ζ ) ( z ) κ κ κ κ κ 1 κ κ 1 α t ζ γ 1 1 = 1/ ν + 1 3/ T1/ ν+ 1 1/ ν+ 1 ( ξωακ,,, ) 1 SN Sκ y z z z z z 4 4 ( ) = 1 α ( + α 1) ( z) ( α z) φ α Φ

Asymptotic properties of the skew-t MLE under the SN Model κ = 0 ( ˆ, ˆ, ˆ, ˆ) is consistent for (,,, κ ) ˆ ϑ = ξωακ ϑ = ξωακ κ ˆ ξ ξ Z ( ) 1 Z1 I I Z ξκ κκ 4 ˆ ω ω Z Z I I Z n = I Z > 0 + I Z < 0 ( ωκ κκ ) ( ) d 1/ 4 ( 4 ) 4 ˆ α α Z3 Z3 Iακ Iκκ Z 4 ˆ κ Z4 0 ( 1 ) ( ) ( SN Z ) 1, Z, Z3, Z4 N 0, I ( )

Asymptotic distribution of the likelihood statistic under the SN distribution ( ) H0 : Y SN ξ, ωα, ( ˆ ˆ ) ln = λ SN St SN d ln λ Z I Z 0 ( ) > Z N ( 0,1) Simulated level of the likelihood ratio test for skew-normality with nominal level 0.05 n α = 0.5 α = 1.5 α = 3 α = 5 α = 10 100 0.0179 0.065 0.0765 0.1676 0.1569 00 0.08 0.090 0.0306 0.0473 0.065 500 0.073 0.039 0.0378 0.0369 0.0331 1000 0.030 0.0338 0.037 0.0416 0.0393 Replications 10,000

Information matrix at the normal model with κ=1/ν S S N ξ N ω ( y) ( y) z = ω 1 z = + ω ω 1/ N Sα ( y) = z π N 1 Sκ y z z 4 4 ( ) = ( 1) N I κ 1/ 1 1 0 0 0 0 ω ω ω ω π = 1/ 1 0 0 ω π π 7 0 0 ω The information matrix is still singular due to the link between S ξ and S α The parameter κ is on the boundary

Pseudo centred parameterization ϑ = ( ξωαν,,, ) ϑκ = ( ξωακ,,, ) (,, ) 1, CP = µ σ γ γ ν > 4 CP ( T ) 1 S = D S κ DP D CP i = ϑ κ, j Azzalini 007 Normal model z Sµ ( y) = σ 1 z 1 S ( y) = σ σ 1 3 Sγ ( y) = ( z z) 1 6 1 Sγ y z z 4 4 ( ) = ( 6 + 3) { ( ) ( )'} I = E S Y S Y N N N CP CP CP ω 0 1 ω 0 0 0 0 1 6 0 0 0 0 1 4 1 0 0 0

Root mean square errors with centered parameterization Root mean square errors of the estimators of the direct parameters ξ, ω, α, e κ=1/ν and the N ξ, ω centred parameters µ, σ, γ 1 e γ when the underlying distribution is ( ) n ξ ω α κ µ σ γ 1 γ 50 0.836 0.3467 350.7858 0.0667 0.1463 0.1171 5.7735 1.367 (1) (43) (151) 100 0.7159 0.651 1.505 0.0459 0.0994 0.071 0.744.8703 00 0.610 0.054 0.9790 0.038 0.0714 0.0505 0.1915 0.4495 500 0.540 0.1489 0.758 0.011 0.0436 0.0318 0.1131 0.1874 1000 0.4584 0.1170 0.6394 0.0157 0.0311 0.0 0.0773 0.153 5000 0.350 0.0705 0.467 0.0073 0.014 0.0101 0.0338 0.0507 10000 0.3115 0.0558 0.4083 0.005 0.0099 0.0070 0.034 0.0353 (In parenthesis there is the number of omitted cases, in 10,000 simulations, due to estimates ˆ ν lower than the value required for the existence of the corresponding moments) (9) (1)

Skew t MLEs of the centred parameters at the Normal model κ = 0 γ is on the boundary Consistency ( ˆ µ, ˆ σ, ˆ γ ˆ ) ( ) 1, γ is consistent for µ, σ, γ1, γ Asymptotic distribution of the estimators n 1/ ˆ µ µ Z1 ˆ σ σ Z d = ˆ γ Z 1 3 ˆ γ ZI Z 0 ( > ) 4 4 ( Z ) ( 1 ) 1 ( 1, Z, Z3, Z4 N 0, I I = diag ω, ω,6,4)

Likelihood ratio normality test ( ) H : Y N ξ, ω 0 d lnλ Z I( Z > 0) + Z i ( ) 1 1 ( ˆ ˆ ) ln = λ N St N Z N 0,1 for i = 1, Simulated level of the likelihood ratio test for normality with Self and Liang percentiles d lnλ Z I( Z > 0) + Z with ( ) 1 1 Zi N 0,1 per i = 1, Nominal n level 100 00 500 1000 5000 10,000 0.10 0.0801 0.0764 0.0796 0.086 0.0881 0.0830 0.05 0.0406 0.0371 0.0371 0.0405 0.0448 0.0436 0.01 0.0095 0.0071 0.0075 0.0084 0.0083 0.0090 Replications 10,000

Skew t and robusteness The model is skew-t. Are the MLE robust? The skew-t is assumed to deal with deviations from a simpler model.

Influence function of the skew-t MLE 1 (, ) = ( ) (, ) IF x F I F S y F { T S ( y F) I( F) S ( y F) } 1/ ϑ ϑ γ = sup,, s y ϑ The degrees of freedom are estimated The influence functions are unbounded The degrees of fredom are fixed The influence functions are bounded The gross error sensitivity is limited if α is finite

Robustness versus flexible models Robust methods work in a neighbourhood Flexible models are enlarged models F Flexible model ADAPTIVE ROBUST ESTIMATION Only some parameters are of interest Y = Xβ + ε The skew-t score functions are modified estimating equations designed to deal with deviations from normality parameters have a clear meaning inference can be based on likelihood function normality can be tested through hypotheses on the parameters prediction intervals

Robustness properties Efficiency at the model Reliability in a neighbourhood of the model What is the loss of efficiency if the simpler model is the correct? How do the skew-t MLEs compare with robust estimators?

Efficiency at the normal model X (, ) N ξ ω Efficiency in the estimation of the location parameter under the normal model n Huber St DP St CP 50 0.95 0.089 0.945 100 0.9489 0.0193 0.9983 00 0.9591 0.013 0.9997 500 0.956 0.0069 1.0000 1000 0.9499 0.0046 1.0001 5000 0.9475 0.0016 1.0000 10000 0.9547 0.0010 1.0000 Efficiency in the estimation of the scale parameter (standard deviation) under the normal model n Huber St DP St CP 50 0.164 0.0835 0.7317 100 0.1604 0.0717 0.998 00 0.1648 0.0604 0.9985 500 0.1677 0.0455 0.9999 1000 0.1619 0.0359 1.0000 5000 0.1680 0.006 1.0001 10000 0.1649 0.0156 1.0000 Huber Proposal - 95% efficiency at the normal model (k=1.35)

Efficiency of the skew-t MLEs when the undelying model is Skew Normal Simulations results n = 1000 Montecarlo replications = 10,000 DIRECT PARAMETER n α = 0.5 α = 1.5 α = 3 α = 5 α = 10 ξ 1.1310 0.7569 0.8608 0.8996 0.9437 ω 1.0931 0.6141 0.615 0.5913 0.6395 α 1.1417 0.8469 0.9630 1.0019 1.060 CENTRED PARAMETERS n α = 0.5 α = 1.5 α = 3 α = 5 α = 10 µ 1.0000 1.0004 0.9988 0.9868 0.9846 σ 0.9999 0.9975 0.984 0.97 0.941 γ 1 0.6774 0.8846 0.5387 0.1953 0.0337 γ 0.0648 0.145 0.0555 0.0140 0.00

Efficiency of the skew t MLE when the underlying distribution is t No loss of efficiency in the estimation of ω and ν Efficiency of the skew t MLE of the location parameter ξ when the underlying distribution is t ν 3 5 7 10 15 0 30 60 DP 0.136 0.0778 0.0363 0.010 0.0114 0.0055 0.0033 0.0015 0.0004 CP 0.980 0.5743 0.8110 0.8958 0.946 0.9751 0.9858 0.9936 0.9984

Robustness in regression models Skew t MLE Least squares t MLE M-estimators Y = Xβ + ε MM estimators (BDP = 0.5, Eff=95%) Parameter of interest: Regression coefficients (with and without fixed degrees of freedom) (Huber s ψ, k=1.345, scale=mad) ψ ( u) xmin 1, k Tukey s biweigth ψ Step 1: S-estimates Step : M-estimates with fixed scale ψ = x u k ( u) = u 1 I( u k) Monte Carlo replications=1000, n=00 Estimate of the intercept adjusted so that the error has zero mean

Y = 5+ 1.5x+ ε Model 1 ( 0,1) N( 0 ) ε 0.9N + 0.1,9 ε 0.4 0 0.3 15 0. y 10 0.1 5 0.0-6 -4-0 4 6 0 4 6 8 10 x Root mean square errors of the estimators of the regression coefficients Model 1 St t LS M MM St St St St St (ν=3) (ν=5) (ν=7) (ν=10) (ν=15) β 1 0.1806 0.1786 0.038 0.1771 0.1760 0.1837 0.1791 0.1793 0.1810 0.1841 β 0.094 0.093 0.0337 0.090 0.089 0.097 0.091 0.091 0.093 0.098 Not substantially different from robust estimators

y y Models from to 4 Model Y = 5+ 1.5x+ ε ( 0,1).1N( 0, ) ε 0.9N + 0 5 ε Model 3 Y = 5+ 1.5x+ ε ( 0,1) N( 8 1) ε 0.95N + 0.05, 0.4 0.3 0. 0.1 5 0 15 10 0.0 5 0 5 10 0 4 6 8 10 x Model 4 0.4 ε 100 Y = 5+.5x+ 0.7x + ε ε ( 8 ) last 0 N,1 0.3 0. 80 60 40 0.1 0 0.0 0 0 4 6 8 10 0 5 10 x

Root mean square errors of the estimators of the regression coefficients Model St t LS M MM St St St St St (ν=3) (ν=5) (ν=7) (ν=10) (ν=15) β 1 0.00 0.1841 0.771 0.187 0.1765 0.191 0.1889 0.194 0.1998 0.117 β 0.0303 0.030 0.0460 0.099 0.091 0.097 0.095 0.099 0.0308 0.033 Model 4 St t LS M MM St St St St St (ν=3) (ν=5) (ν=7) (ν=10) (ν=15) β 1 1.08 0.9714.0716 1.464 0.859 0.9088 1.0373 1.1616 1.858 1.3961 β 0.505 0.5931 1.4048 0.9877 0.199 0.689 0.3554 0.4186 0.483 0.54 β 0.091 0.0703 0.177 0.10 0.019 0.031 0.0419 0.0496 0.0574 0.0647 3 Model 3 St t LS M MM St St St St St (ν=3) (ν=5) (ν=7) (ν=10) (ν=15) β 1 0.3913 0.186 0.5079 0.199 0.1645 0.3388 0.3337 0.3614 0.4066 0.465 β 0.097 0.0300 0.0517 0.091 0.071 0.09 0.09 0.099 0.0309 0.031 The skew-t MLE estimators are definitely better than L.S. not as efficient as MM estimators Should be the degrees of freedom be held fixed?