Olie Appedix for Measurig the Dark Matter i Asset Pricig Models Hui Che Wisto Wei Dou Leoid Koga April 15, 017 Cotets 1 Iformatio-Theoretic Iterpretatio for Model Fragility 1.1 Bayesia Aalysis with Limited-Iformatio Likelihoods................ 5 1. The Effective-Sample Size................................. 8 1.3 Geeric Notatios ad Defiitios............................ 13 1.4 Regularity Coditios for Theoretical Results...................... 14 1.5 Lemmas........................................... 18 1.6 Basic Properties of Limited-Iformatio Likelihoods.................. 3 1.7 Iformatio Matrices of Limited-Iformatio Likelihoods................ 9 1.8 Properties of Posteriors Based o Limited-Iformatio Likelihoods.......... 34 1.9 Proof of Theorem 1..................................... 43 1.10 Proof of Theorem..................................... 43 1.11 Proof of Theorem 3..................................... 54 Disaster risk model 70.1 The Euler Equatio.................................... 70. Fisher fragility measure.................................. 71.3 Posteriors.......................................... 7.4 ABC Method ad Implemetatio............................ 73.5 Results............................................ 75 3 Log-ru Risk Model: Solutios ad Momet Coditios 76 3.1 The Model Solutio.................................... 76 Che: MIT Sloa ad NBER (huiche@mit.edu). Dou: Wharto (wdou@wharto.upe.edu). Koga: MIT Sloa ad NBER (lkoga@mit.edu). 1
3. Calibratios: Simulated ad Empirical Momets.................... 80 3.3 Geeralized Methods of Momets............................. 8 4 Iformatio-Theoretic Iterpretatio for Model Fragility Based o Cheroff Rates 84 1 Iformatio-Theoretic Iterpretatio for Model Fragility I Che, Dou, ad Koga (017), we provide a ecoometric justificatio for our Fisher fragility measure: it essetially quatifies the over-fittig tedecy of the fuctioal-form specificatios for certai structural compoets of a model. I this sectio, we formalize the ituitio that excessive iformativeess of cross-equatio restrictios is fudametally associated with model fragility. More precisely, we formally show that structural ecoomic models are fragile whe the cross-equatio restrictios appear excessively iformative about certai combiatios of model parameters that are otherwise difficult to estimate (we refer to such parameter combiatios as dark matter ). To do so, we itroduce a measure of iformativeess of the cross-equatio restrictios, ad express it i terms of the effective sample size. This iformatioal measure of cross-equatio restrictios is iterpretable i fiite samples ad provides ecoomic ituitio for the Fisher fragility measure. We develop our aalysis i a Bayesia framework. Startig with a prior o (θ, ψ), deoted by π(θ, ψ), we obtai posterior distributios through the baselie model ad the structural model, respectively. The, the discrepacy betwee the two posteriors shows how the cross-equatio restrictios affect the iferece about θ. We assume that the stochastic process x t is strictly statioary ad ergodic with a statioary distributio P. The true joit distributio for x (x 1,, x ) is P. Similarly, we assume that the joit stochastic process x t, y t is strictly statioary ad ergodic with a statioary distributio Q. The ecoometricia does ot eed to specify the full fuctioal form of the joit distributio of (x, y ) (x t, y t ) : t = 1,,, which we deote by Q. The ukow joit desity is q(x, y ). We evaluate the performace of a structural model uder the Geeralized Method of Momets (GMM) framework. The semial paper by Hase ad Sigleto (198) pioeers the literature of applyig GMM to evaluate ratioal expectatio asset pricig models. Specifically, we assume that the model builder is cocered with the model s i-sample ad out-of-sample performaces as represeted by a set of momet coditios, 1 based o a D Q 1 vector of fuctios g Q (θ, ψ; x, y) of 1 We ca also adopt the CUE method of Hase, Heato, ad Yaro (1996) or its modificatio Hausma, Lewis, Mezel, ad Newey (011) s RCUE method, or some other extesio of GMM with the same first-order efficiecy ad possibly superior higher-order asymptotic properties. This will lead to alterative but coceptually similar measures of overfittig. To simplify the compariso with the Fisher fragility measure, we chose to use the origial GMM framework.
data observatios (x t, y t ) ad the parameter vectors θ ad ψ satisfyig the followig coditios: E g Q (θ 0, ψ 0 ; x t, y t )] = 0. (1) The baselie momet fuctios g P (θ; x t ) characterize the momet coditios of the baselie model. They costitute the first D P elemets of the whole vector of momet fuctios g Q (θ, ψ; x t, y t ). Thus, the baselie momets ca be represeted by the full set of momets weighted by a special matrix: g P (θ; x t ) = Γ P g Q (θ, ψ; x t, y t ) where Γ P I DP, O DP (D Q D P )]. () The momet fuctios g P (θ; x t ) deped oly o parameters θ, sice all parameters of the baselie model are icluded i θ. Accordigly, the momet coditios for the baselie model is E g P (θ 0 ; x t )] = 0. (3) Deote the empirical momet coditios for the full model ad the baselie model by ĝ Q, (θ, ψ) 1 g Q (θ, ψ; x t, y t ) ad ĝ P, (θ) 1 g P (θ; x t ), respectively. The, the optimal GMM estimator (ˆθ Q, ˆψ Q ) of the full model ad that of the baselie model ˆθ P miimize, respectively, Ĵ,SQ (θ, ψ) ĝ Q, (θ, ψ) T S 1 Q ĝq,(θ, ψ) ad Ĵ,S P (θ) ĝ P, (θ) T S 1 P ĝp,(θ). (4) Here, Ĵ,SQ (θ, ψ) ad Ĵ,S P (θ) are ofte referred to as the J-distaces, ad S Q ad S P have the followig explicit formulae (see Hase, 198), S Q S P + l= + l= E g Q (θ 0, ψ 0 ; x t, y t )g Q (θ 0, ψ 0 ; x t l, y t l ) T ], ad (5) E g P (θ 0 ; x t )g P (θ 0 ; x t l ) T ], respectively. (6) The matrix S Q ad S P are the covariace matrices of the momet coditios at the true parameter values. I practice, whe S Q or S P is ukow, we ca replace it with a cosistet estimator ŜQ, or ŜP,, respectively. The cosistet estimators of the covariace matrices are provided by Newey ad West (1987), Adrews (1991), ad Adrews ad Moaha (199). We use GMM to evaluate model performace because of the cocer of likelihood mis-specificatio. The GMM approach gives the model builder flexibility to choose which aspects of the model to emphasize whe estimatig model parameters ad evaluatig model specificatios. This is i cotrast to the likelihood approach, which relies o the full probability distributio implied by the 3
structural model. Fially, we itroduce some further otatio. We deote the GMM Fisher iformatio matrix for the baselie model as I P (θ) (see Hase, 198; Hah, Newey, ad Smith, 011), ad I P (θ) G P (θ) T S 1 P G P(θ), (7) where G P (θ) E g P (θ; x t )],ad for brevity, we deote G P G P (θ 0 ). We deote the aalog for the structural model as I Q (θ, ψ), I Q (θ, ψ) G Q (θ, ψ) T S 1 Q G Q(θ, ψ), (8) where G Q (θ, ψ) E g Q (θ, ψ; x t, y t )], ad for brevity, we deote G Q G Q (θ 0, ψ 0 ). Computig the expectatio G Q (θ) ad G Q (θ, ψ) requires kowig the distributio Q. I cases whe Q is ukow, G P (θ) i (7) ad G Q (θ, ψ) i (8) ca be replaced by their cosistet estimators g P (θ; x t ) ad g Q (θ, ψ; x t, y t ). For the full model Q, we will focus o its implied Fisher iformatio matrix I Q (θ ψ): I Q (θ ψ) Γ Θ I Q (θ, ψ) 1 Γ T Θ] 1, where ΓΘ I DΘ, O DΘ D Ψ ]. (9) More precisely, the Fisher iformatio matrix I Q (θ, ψ) ca be partitioed ito a two-by-two block matrix accordig to θ ad ψ: I Q (θ, ψ) = I (1,1) Q (θ, ψ), I (1,) Q (θ, ψ) I (,1) Q (θ, ψ), I (,) Q (θ, ψ) ], (10) where I (1,1) Q (θ, ψ) is the D Θ D Θ iformatio matrix correspodig to baselie parameters θ, I (,) Q (θ, ψ) is the D Ψ D Ψ iformatio matrix correspodig to uisace parameters ψ, ad I (1,) Q (θ, ψ) = I (,1) Q (θ, ψ) T is the D Θ D Ψ cross-iformatio matrix correspodig to θ ad ψ. The I Q (θ ψ) ca be writte as I Q (θ ψ) = I (1,1) Q (θ, ψ) I (1,) Q (θ, ψ)i (,) Q (θ, ψ) 1 I (1,) Q (θ, ψ) T, (11) which geerally is ot equal to the Fisher iformatio sub-matrix I (1,1) Q (θ, ψ) for baselie parameters θ, except the special case i which I (1,) Q (θ, ψ) = 0, i.e. the kowledge of θ ad that of ψ are ot iformative about each other. We assume that the iformatio matrices are osigular i this paper (Assumptio A4 i Appedix 1.4). We first itroduce a momet-based Bayesia method with limited-iformatio likelihoods i Subsectio 1.1. The, we itroduce the defiitio of effective sample size to gauge the iformativeess of the cross-equatio restrictios ad state the mai results i Subsectio 1.. Third, we predefie the ecessary special otatios i Subsectio 1.3. Fourth, we itroduce the stadard regularity coditios i Subsectio 1.4. Fifth, we prove the basic lemmas i Subsectio 1.5, which are themselves iterestig ad geeral. Sixth, i Subsectio 1.6, we state ad prove propositios which 4
serve as itermediate steps for the proof of the mai results. Fially, the mai results stated i Subsectio 1. are proved i Subsectio 1.9-1.11. 1.1 Bayesia Aalysis with Limited-Iformatio Likelihoods Whe likelihood-based methods are difficult or ureliable, oe robust tool for the ecoometricia to derive the posteriors of baselie model π P (θ x ) ad full structural model π Q (θ, ψ x, y ) is the limited-iformatio likelihood (LIL). It relies o certai momet coditios used for model evaluatio. Of course, the full likelihood fuctio ca be used whe likelihood methods are possible. Here, restricted to the momet costraits, we embed a likelihood that is closest to the uderlyig true distributio ito the Bayesia paradigm. We use the Kullback-Leibler divergece (also kow as the relative etropy) to gauge the discrepacy betwee probability measures. 3 We first focus o the limited-iformatio likelihood for the full structural model. The largesample results of Kim (00) provide a asymptotic Bayesia iterpretatio of GMM, usig the expoetial quadratic form ad igorig the fiite-sample validity ad a valid parametric family for likelihoods. However, our iformatio-theoretic ratioale to model fragility ideed requires a valid fiite-sample iterpretatio. To achieve this goal, we adopt the framework of Kitamura ad Stutzer (1997), ad we focus o the case that momet fuctios are stable autoregressive processes. 4 More precisely, there exists a set of autoregressive (AR) coefficiets ω 0,, ω mg of the followig stable AR regressio have zero meas ad zero serial correlatios: m g such that the error terms gq ω (θ 0, ψ 0 ; z t ) ω j g Q (θ 0, ψ 0 ; x t j, y t j ) (1) j=0 where z t x t j, y t j : j = 0,, m g. This assumptio does ot offer the most geeral settig; however, it should provide a fair approximatio to the dyamics of momet coditios i may cases studied i fiace ad ecoomics, ad more importat, it allows a factorizatio of the likelihood for fiite-sample iterpretatio while guarateeig the first-order asymptotic efficiecy of the aalogous maximum likelihood estimator. Importatly, as a result of stability, the origial momet coditio E g Q (θ 0, ψ 0 ; x t, y t )] = 0 is equivalet to the momet coditio: E g ω Q (γ 0; z t )] = 0, (13) The idea of ackowledgig that it is ofte very difficult to come up with precise distributios to be used as likelihood fuctios ad thus choosig a likelihood (or a prior) amog a set of samplig models (or priors) i Bayesia aalysis is referred to as robust Bayesia aalysis (see, e.g. Wasserma, 199; Berger, 1994; Pericchi ad Pérez, 1994). 3 Alteratively, the empirical likelihood (EL) of Owe (1988, 1990, 1991) ad the expoetial tilted empirical likelihood (ETEL) of Kitamura ad Stutzer (1997) ad Scheach (007) ca also be used as the likelihood part of the Bayesia iferece. I fact, they are the Bayesia empirical likelihood (BEL) of Lazar (003) ad the Bayesia expoetial tilted empirical likelihood (BETEL) of Scheach (005). Oe applicatio of BEL ad BETEL is the statistical aalysis of disaster risk models i Julliard ad Ghosh (01). 4 The same assumptio is also adopted by Kim (00, Remarks 1 ad 3). More details ca be foud i Assumptio A9 i Appedix 1.4. The assumptio ca be weakeed to the case that the lag size m g icreases with sample size. 5
where we defie γ (θ, ψ) ad γ 0 (θ 0, ψ 0 ) for otatioal simplicity. Give each γ, we defie the set of probability measures, deoted by Q(γ; ), such that Q(γ; ) Q : E Q g ω Q (γ; z t)] = 0, t = 1,,. (14) The true distributio of z = (z 1,, z ), deoted by Q, belogs to the distributio set Q(γ 0 ; ). The limited-iformatio likelihood Q γ, for Bayesia aalysis of full model is chose accordig to the priciple of miimum Kullback-Leibler divergece: Q γ, = argmi D KL (Q Q ) = argmi Q Q(γ;) Q Q(γ;) l (dq /dq ) dq (15) where dq /dq is the Rado-Nikodym derivative (or desity) of Q with respect to the true probability measure Q. It is well-kow that the limited-iformatio likelihood Q γ, has the followig Gibbs caoical desity (see, e.g., Csiszár, 1975; Cover ad Thomas, 1991, Chapter 11): dq γ, /dq = exp η Q (γ) T gq ω (γ; z t) A Q (γ), γ (16) ] where A Q (γ) l E e η Q(γ) T gq ω(γ;z t), ad the Lagragia multipliers η Q (γ) are chose to make the momet coditios satisfied: 0 = E gq ω (γ; z)eη Q(γ) T gq ω(γ;z)], γ. (17) We deote π Q (z t ; γ) exp η Q (γ) T g ω Q (γ; z t) A Q (γ). The posterior desity is π Q (γ x, y ) π(γ) exp η Q (γ) T gq ω (γ; z t) A Q (γ). (18) More discussios o the validity of the limited-iformatio likelihood ad Bayesia aalysis ca be foud i Appedix 1.1. Aalogously, the limited-iformatio likelihood P θ, for the baselie model is costructed based o the baselie momet coditios E g P (θ, x t )] = 0 ad the priciple of miimum Kullback-Leibler divergece (15). It has the desity fuctio: dp θ, /dp = exp η P (θ) T gp ω (θ; w t) A P (θ) (19) where w t = x t j : j = 0,, m g with uderlyig true distributio P ; gp ω(θ; w t) is the correspodig sub-vector of smoothed momets gq ω(θ; z t) for the baselie model; the fuctios η P (θ) ad A P (θ) are defied aalogous to η Q (γ) ad A Q (γ). We deote π P (w t ; θ) exp η P (θ) T gp ω(θ; w t) A P (θ). The MLE for P θ,, deoted by ˆθ ML P, also satisfies asymptotic ormality with variace I P(θ 0 ) 1. 6
Similar to full model, the posterior of baselie model is π P (θ x ) π(θ) exp η P (θ) T gp ω (θ; w t) A P (θ). (0) Without loss of geerality, we assume that g P (θ; x t ) = gp ω(θ; w t) ad g Q (γ; x t, y t ) = gq ω(γ; z t). It should also be oted that the fuctios i the limited-iformatio likelihoods η P (θ), A P (θ), η Q (γ), ad A Q (γ) are ot kow. It is iocuous for our theoretical exercises ad results. However, i practice, they eed to be replaced by their approximatig couterparts. For example, η P (θ) ca be estimated by solvig the followig equatio, for each θ, 0 = 1 g P (θ; x )e η P(θ) T g P (θ;x t ) t (1) ad the we estimate A P (θ) by ÂP(θ) as follows: Â P (θ) = l 1 e η P(θ) T g P (θ;x t ) ]. () The fuctioal estimators η Q (γ) ad ÂQ(γ) ca be costructed i the similar way. Uder the regularity coditios, the fuctioal estimators coverge to the true fuctios uiformly i probability. Discussio: The Validity of Limited-Iformatio Likelihood Q γ, The GMM Bayesia methods of Kim (00) ad Cherozhukov ad Hog (003) are primarily for large sample aalysis ad potetially ivalid for fiite-sample iterpretatio due to the fact that the GMM framework based o a quadratic form empirical momets uder geeral assumptios are justified asymptotically (see, Hase, 198). Specifically, Cherozhukov ad Hog (003) motivate the expoetial of momet coditios quadratic form as the limited-iformatio likelihood maily from a computatioal perspective, without providig a compellig theoretical ratioalizatio. Kim (00) justifies the particular expoetial quadratic form by appealig to the priciple of miimum Kullback-Leibler divergece yet based o momet coditios i a asymptotic sese. To guaratee a valid fiite-sample iterpretatio ad valid parametric family for likelihoods, we adopt the framework of Kitamura ad Stutzer (1997). Here are several reasos for the limited-iformatio likelihood i (16) to be used as a valid parametric family for likelihood withi the Bayesia paradigm. First, the set of distributios Q γ, characterize a proper parametric family of likelihoods for statistical iferece, sice Q γ, is the true distributio if ad oly if γ = γ 0, that is η Q (γ 0 ) = A Q (γ 0 ) = 0. Further, the MLE of the parametric family Q γ,, deoted by ˆγ ML Q, is asymptotically first-order equivalet to the GMM estimator Hase (198) ad the ET estimator Kitamura ad Stutzer (1997) with a ormal asymptotic distributio 7
(see Sectio 1.6): ) wlim (ˆγ Q ML γ 0 = N(0, I Q (γ 0 ) 1 ), with I Q (γ 0 ) = E l (dq γ, /dq ) ]. It should be oted that Kitamura ad Stutzer (1997) propose the expoetial tilted estimator based o Fechel duality: ] η Q (γ) = argmi E e ηt g Q (γ;z), γ. (3) η Their motivatio is computatioal. But we are ot proposig ew estimatio method here. Thus, we directly cosider the maximum likelihood estimator for the limited-iformatio likelihood family Q γ,. Secod, the limited-iformatio likelihood (16) ca be factorized properly so as to be cosistet with the statioary Markovia properties of uderlyig time series, sice the sequetial depedece of Q γ, are fully captured by the true distributio Q. Third, the Bayesia aalysis based o (16) is equivalet to Bayesia expoetially tilted empirical likelihood (BETEL) asymptotically (see, e.g., Julliard ad Ghosh, 01). Scheach (005) provides a probabilistic iterpretatio of the expoetial tilted empirical likelihood that justifies its use i Bayesia iferece. 1. The Effective-Sample Size We quatify the discrepacy betwee probability distributios usig a stadard statistical measure, the relative etropy (also kow as the Kullback-Leibler divergece). The relative etropy betwee π P (θ x ) ad the margial posterior π Q (θ x, y ) is D KL π Q (θ x, y ) π P (θ x )] = l ( πq (θ x, y ) ) π P (θ x π Q (θ x, y )dθ. (4) ) Ituitively, we ca thik of the log posterior ratio l(π Q (θ x, y )/π P (θ x )) as a measure of the discrepacy betwee the two posteriors at a give θ. The the relative etropy is the average discrepacy betwee the two posteriors over all possible θ, where the average is computed uder the costraied posterior. D KL (π Q (θ x, y ) π P (θ x )) is fiite if ad oly if the support of the posterior π Q (θ x, y ) is a subset of the support of the posterior π P (θ x ), that is, Assumptio A6 i Subsectio 1.4 holds. The magitude of relative etropy is difficult to iterpret directly, ad we propose a ituitive effective sample size iterpretatio. Istead of imposig the cross-equatio restrictios from the structural model, oe ca gai extra iformatio about θ withi the baselie model with additioal data. We evaluate the amout of additioal data uder the baselie model eeded to match the iformativeess of cross-equatio restrictios. 8
Suppose we draw additioal data x m of sample size m from the Bayesia predictive distributio π P ( x m x ) π P ( x m θ)π P (θ x )dθ, (5) where the effective sample x m is idepedet of observed sample x give baselie parameters θ, that is, π P ( x m, θ x ) = π P ( x m θ)π P (θ x ). Agai, we measure the gai i iformatio from this additioal sample x m usig relative etropy, D KL (π P (θ x m, x ) π P (θ x )) = ( πp (θ x m, x ) ) l π P (θ x π P (θ x m, x )dθ. (6) ) D KL (π P (θ x m, x ) π P (θ x )) depeds o the realizatio of the additioal sample of data x m. The average relative etropy (iformatio gai) over possible future samples x m accordig to the Bayesia predictive distributio π P ( x m x ) equals the mutual iformatio betwee x m ad θ give x : I( x m ; θ x ) E xm x DKL (π P (θ x m, x ) π P (θ x )) ] = D KL (π P (θ x m, x ) π P (θ x )) π P ( x m θ) π P (θ x ) d x m dθ. (7) Like the relative etropy, the mutual iformatio is always positive. It is easy to check that I( x m ; θ x ) = 0 whe m = 0. Uder the assumptio that the prior distributio is osigular ad the parameters i the likelihood fuctio are well idetified, ad additioal geeral regularity coditios, I( x m ; θ x ) is mootoically icreasig i m ad coverges to ifiity as m icreases. These properties esure that we ca fid a extra sample size m that equates (approximately, due to the fact that m is a iteger) D KL (π Q (θ x, y ) π P (θ x )) with I( x m ; θ x ). It is oly meaigful to match 1-dimesioal distributios eve i asymptotic seses. Thus, the results i this sectio are valid oly for scalar feature fuctios. Defiitio 1 (Effective-Sample Size Iformatio Measure). For a feature fuctio vector f : R D Θ R, we defie the effective-sample measure of the iformativeess of the cross-equatio restrictios as ϱ f KL (x, y ) = + m f, (8) where m f eables the matchig of two iformatio quatities I( x m f ; f(θ) x ) D KL (π Q (f(θ) x, y ) π P (f(θ) x )) < I( x m f +1 ; f(θ) x ), (9) with D KL (π Q (f(θ) x, y ) π P (f(θ) x )) beig the relative etropy betwee the costraied ad ucostraied posteriors of f(θ) ad I( x m ; f(θ) x ) beig the coditioal mutual iformatio betwee the additioal sample of data x m ad the trasformed parameter f(θ) give the existig sample of data x. 9
For scalar-valued feature fuctios (D f = 1), there exists a direct coectio betwee our Fisher fragility measure ad the relative-etropy based iformativeess measure (i.e. effective-sample size iformatio measure i Defiitio 1). The effective-sample size ratio ϱ f KL (x, y ) is defied with fiite-sample validity. The followig theorem establishes asymptotic equivalece betwee the Fisher fragility measure ϱ v (θ 0 ψ 0 ) ad the effective-sample size iformatio measure ϱ f KL (x, y ). Ituitively, the extra effective-sample size m f icreases proportioally i the observed sample size where the limitig proportioal growth rate wlim ϱ f KL (x, y ) is stochastic. Theorem 1. Cosider a feature fuctio f : R D Θ regularity coditios A1 - A8 stated i Subsectio 1.4, it must hold that R with v = f(θ 0 ). Uder the stadard wlim l ϱf KL (x, y ) = l ϱ v (θ 0 ψ 0 )] + 1 ϱ v (θ 0 ψ 0 ) 1] (χ 1 1), (30) where χ 1 is a chi-square radom variable with degrees of freedom 1. It immediately implies that ] E wlim l ϱf KL (x, y ) = l ϱ v (θ 0 ψ 0 )]. The result of Theorem 1 follows immediately from the followig approximatio results summarized i Theorem ad Theorem 3. Without loss of geerality, we assume that g ω Q (θ, ψ; z t) g Q (θ, ψ; x t, y t ) or equivaletly m g = 0 i Equatio (1). Also, because of Assumptio A7 (the regular feature fuctio coditio), as well as the fact that the defiitio of our dark matter measure i Che, Dou, ad Koga (017) ad regularity assumptios A1 - A6 i Appedix 1.4 are ivariat uder ivertible ad secod-order smooth trasformatios of parameters, we ca assume that D Θ = 1 i the ituitive proofs. The full proofs are i the olie appedix. Theorem (Kullback-Leibler Divergece). Cosider a feature fuctio f : R D Θ R with v = f(θ 0 ). Let the MLE for the limited-iformatio likelihood of the baselie model P be ˆθ P ML, ad the aalogy of the structural model Q be ˆγ ML Q = (ˆθ ML Q, ˆψ ML Q ). Uder the regularity coditios stated i Appedix 1.4, D KL (π Q (f(θ) x, y ) π P (f(θ) x )) 1 l where covergece is i probability uder Q. vi Q (θ 0 ψ 0 ) 1 v T (f( θ P ML) f( θ Q ML)) vi P(θ 0 ) 1 v T vi Q (θ 0 ψ 0 ) 1 v T + 1 vi Q (θ 0 ψ 0 ) 1 v T vi P (θ 0 ) 1 v T 1/, (31) This theorem geeralizes the results i Li, Pittma, ad Clarke (007, Theorem 3) i two importat aspects. It focuses o the results of the momet-based LIL framework, istead of the stadard likelihood-based framework. Ad also, it allows for geeral weak depedece amog the observatios which makes our results applicable to time series models i fiace ad ecoomics. 10
Here we provide a ituitive proof. The detailed ad rigorous techical proof ca be foud i Subsectio 1.10. Whe is large, it followig approximatios hold uder the stadard coditios: π P (f(θ) x ) N π Q (f(θ) x, y ) N Therefore, whe is large, the asymptotic approximatio (31) holds. ( f(ˆθ P ML), 1 vi P (θ 0 ) 1 v T ), ad (3) ( f(ˆθ Q ML), 1 vi Q (θ 0 ψ 0 ) 1 v T ). (33) Corollary 1. Cosider the feature fuctio f : R D Θ R with v = f(θ 0 )/ θ. Uder the assumptios i Subsectio 1.4, if we defie the it follows that Moreover, wlim + λ vt I P (θ 0 ) 1 v v T I Q (θ 0 ) 1 v, v T I Q (θ 0 ) 1 v (f( θ P ) f( θ Q )) = (1 λ 1 )χ 1. (34) wlim D KL(π Q (f(θ) x, y ) π P (f(θ) x )) = 1 λ 1 (χ + 1 1) + 1 l(λ), (35) The asymptotic distributio result i (34) is directly from Propositio 11. The asymptotic result i (35) is based o Theorem, limit result (34), ad the Slutsky Theorem. Theorem 3 (Mutual Iformatio). Cosider a feature fuctio f : R D Θ R with v = f(θ 0 ). Uder the assumptios i Subsectio 1.4, if m/ ς (0, ) as both m ad approach ifiity, where covergece is i probability uder Q. I( x m ; f(θ) x ) 1 ( ) m + l 0, (36) There exist related approximatio results for mutual iformatio I( x m ; θ x ), which cosider large m while holdig the observed sample size fixed. For more details, see Clarke ad Barro (1990, 1994) ad refereces therei. See also the case of o-idetically distributed observatios by Polso (199), amog others. Our results differ i that we allow both m ad to grow. Ours is a techically otrivial extesio of the existig results. Here we provide a ituitive proof. The detailed ad rigorous techical proof ca be foud i 11
Subsectio 1.10. By defiitio i (7), the mutual iformatio give x ca be rewritte as I( x m ; θ x ) = π P ( x m θ)π P (θ x ) l π P( x m, x θ)π P (x ) π P ( x m, x )π P (x θ) d xm dθ = π P ( x m θ)π P (θ x ) l π P( x m, x θ) π P ( x m, x ) d xm dθ π P (θ x ) l π P(x θ) π P (x ) dθ. The followig approximatio is stadard i the iformatio-theoretic literature (see, e.g., Clarke ad Barro, 1990, 1994): whe m ad are large, it holds that l π P( x m, x θ) π P ( x m, x ) 1 S m+ (θ) T I P (θ) 1 Sm+ (θ) + 1 l m + π + l 1 π P (θ) + 1 l I P(θ), where S m+ (θ) Similarly, it also holds that 1 m + l π P (x t ; θ) + ] m l π P ( x t ; θ). (37) l π P(x θ) π P (x ) 1 S (θ) T I P (θ) 1 S (θ) + 1 l π + l 1 π P (θ) + 1 l I P(θ), (38) where It follows from (37) ad (39) that where S (θ) 1 l π P (x t ; θ). (39) m S m+ (θ) = m + S (θ) + m + S m (θ), (40) S m (θ) 1 m l π P ( x t ; θ). (41) m Now recall that the family of limited-iformatio likelihoods π P (x; θ) satisfy the stadard properties: θ, E π P (x; θ)] = 1 ad E π P (x; θ) l π P (x; θ)] = 0 ad E π P (x; θ) l π P (x; θ)] l π P (x; θ)] T = E π P (x; θ) l π P (x; θ) = I P (θ). Thus, the followig importat idetity ca be derived: S m+ (θ) T I P (θ) 1 Sm+ (θ)π P ( x m θ)π P (θ x )d x m dθ = S (θ) T I P (θ) 1 S (θ)π P (θ x )dθ + m m + m +. 1
Usig the Taylor expasio of S (θ) aroud ˆθ ML P ad the ormal approximatio of posterior (3), S (θ) T I P (θ) 1 S (θ)π P (θ x )dθ I P (θ 0 ) 1 1 1 ] ( l π P (x t ; ˆθ ML) P θ ˆθ P ML) ϕp, (θ)dθ where ϕ P, (θ) = det 1 I P (θ 0 )] /(π) D Θ exp 1 (θ ˆθ P ML )T I P (θ 0 )] (θ ˆθ P ML ). Therefore, whe m ad are large, the mutual iformatio ca be approximated as follows I( x m ; θ x ) 1 l m + + m m + 1 l m +. S (θ) T I P (θ) 1 S (θ)π P (θ x )dθ m ] m + Now we have completed the ituitive proof. The legthy full proof i Subsectio 1.9 follows the same mai idea, yet with rigorous establishmets of the approximatio sigs above. 1.3 Geeric Notatios ad Defiitios First, we itroduce some otatios for the matrices. For ay real symmetric o-egative defiite matrix A, we defie λ M (A) to be the largest eigevalue of A ad defie λ m (A) to be the smallest eigevalue of A. For a matrix A, we defie the spectral orm of A to be A S. By defiitio of spectral orm, we kow that for ay real matrix A, A S λ M (A T A). Deote λ(θ) ad λ(θ) to be the largest eigevalue ad the smallest eigevalue of I P (θ), respectively. That is, λ(θ) = λ m (I P (θ)) ad λ(θ) = λ M (I P (θ)). If the matrix I P (θ) is cotiuous i θ, λ(θ) ad λ(θ) are cotiuous i θ. We defie upper boud ad lower boud to be λ sup λ(θ), ad λ if λ(θ). (4) θ Θ θ Θ Secod, we itroduce some otatios related to subsets i Euclidea spaces. We defie the Euclidea distace betwee two sets S 1, S R D Θ as follows d L (S 1, S ) if s 1 s : s i S i, i = 1,. (43) For θ R D Θ, we deote θ (1) to be the first elemet of θ ad deote θ ( 1) to be the D Θ 1 dimesioal vector cotaiig all elemets of θ other tha θ (1). Defie the ope ball cetered at θ with radius r 13
to be Deote Ω(θ, r) ϑ : ϑ θ < r ad Ω(θ, r) ϑ : ϑ θ r Ω (1) (θ, r) = Ω(θ (1), r) R 1, ad Ω ( 1) (θ, r) = Ω(θ ( 1), r) R D Θ 1. I additio, we defie Θ 1 (θ (1) ) θ ( 1) R D Θ 1 ( θ (1) θ ( 1) ) Θ, (44) ad we deote V Θ Vol(Θ) < +, V Θ (θ (1) ) Vol(Θ 1 (θ (1) )) < +, ad V Θ,1 sup θ (1) Θ (1) V Θ (θ (1) ) < +. Third, we itroduce some otatios o metrics of probability measures. Cosider two probability measures P ad Q with desities p ad q with respect to Lebesgue measure, respectively. The Helliger affiity betwee P ad Q is deoted as α H (P, Q) p(x)q(x)dx. The total variatio distace betwee P ad Q is deoted as P Q T V p(x) q(x) dx. Fourth, we itroduce otatios for time series. The maximal correlatio coefficiet ad the uiform mixig coefficiet are defied as ρ max (F 1, F ) sup Corr(f 1, f ), f 1 L real (F 1),f L real (F ) ad φ(f 1, F ) sup P (A A 1 ) P (A ), A 1 F 1,A F where L real(f i ) deote the space of square-itegrable. F i -measurable, real-valued radom variables, for ay sub σ fields F i F. 1.4 Regularity Coditios for Theoretical Results The regularity coditios we choose to impose o the behavior of the data are iflueced by three major cosideratios. First, our assumptios are chose to allow processes of sequetial depedece. I particular, the processes allowed should be relevat to itertemporal asset pricig models. Secod, our assumptios are required to meet the aalytical tractability. Third, our assumptios are 14
sufficiet coditios i the sese that we are ot tryig to provide the weakest coditios or high level coditios to guaratee the results; but istead, we chose those regularity coditios which are relatively straightforward to check i practice. Assumptio A1 (Statioarity Coditio) The uderlyig time series (x t, y t ) with t = 1,, follow a m S -order strictly statioary Markov process. Remark. This assumptio implies that the margial coditioal desity for x t ca be specified as π P (x t θ, x t 1,, x t ms ). Defie the stacked vectors x t = (x t,, x t ms +1) T ad y t = (y t,, y t ms +1) T. The the margial coditioal desity from the parametric family specified for the baselie model ca be rewritte as π P (x t ; θ), ad the stacked vectors (x t, y t ) follow a first-order Markov process. Assumptio A (Mixig Coditio) There exists costat λ D d D /(d D 1), where d D is the costat i Assumptio A3 (domiace coditio), such that (x t, y t ) for t = 1,,, is uiform mixig ad there exists a costat φ such that the uiform mixig coefficiets satisfy φ(m) φm λ D for all possible probabilistic models, where φ(m) is the uiform mixig coefficiet. Its defiitio is stadard ad ca be foud, for example, i White ad Domowitz (1984) or Bradley (005). Remark. Followig the literature (see, e.g. White ad Domowitz, 1984; Newey, 1985b; Newey ad West, 1987), we adopt the mixig coditios as a coveiet way of describig ecoomic ad fiacial data which allows time depedece ad heteroskedasticity. The mixig coditios basically restrict the memory of a process to be weak, while allowig heteroskedasticity, so that large sample properties of the process are preserved. I particular, we employ the uiform mixig which is discussed i White ad Domowitz (1984). Assumptio A3 (Domiace Coditio) The fuctio g Q (θ, ψ; x, y) is twice cotiuously differetiable i (θ, ψ) almost surely. There exist domiatig measurable fuctios a 1 (x, y) ad a (x, y), ad costat d D > 1, such that almost everywhere g Q (θ, ψ; x, y) a 1 (x, y), g Q (θ, ψ; x, y) S a 1(x, y), g Q,(i) (θ, ψ; x, y) S a 1(x, y), for i = 1,, D g, q(x, y) a (x, y), q(x 1, y 1, x t, y t ) a (x 1, y 1 )a (x t, y t ), for t, a 1 (x, y)] d D a (x, y)dxdy < +, a (x, y)dxdy < +, where S is the spectral orm of matrices. 15
Remark. The domiatig fuctio assumptio is widely adopted i the literature of geeralized method of momets (Newey, 1985a,b; Newey ad West, 1987). The domiatig assumptio, together with the uiform mixig assumptio ad statioarity assumptio, imply the stochastic equicotiuity coditio (iv) i Propositio 1 of Cherozhukov ad Hog (003). I the semial GMM paper by Hase (198), the momet cotiuity coditio ca also be derived from the domiace coditios. Remark. Followig the discussio of Sectio 1, for each pair of j ad k, it holds that for some costat ζ > 0 ad large costat C > 0, E sup l π P (w; θ) θ j θ k E θ Θ sup γ Θ Ψ +ζ l π Q (z; γ) γ j γ k < C, ad E sup l π P (w; θ) θ j +ζ θ Θ < C, ad E sup γ Θ Ψ +ζ l π Q (z; γ) γ j < C, ad (45) +ζ < C, (46) where γ = (θ, ψ), ad π P ad π Q are defied i Sectio 1. The domiace coditio, together with the uiform mixig assumptio ad statioarity assumptio, implies the stochastic equicotiuity coditio (i) i Propositio 3 of Cherozhukov ad Hog (003). Assumptio A4 (Nosigular Coditio) The Fisher iformatio matrices I P (θ) ad I Q (θ, ψ) are positive defiite for all θ, ψ. Remark. It implies that the covariace matrices S P ad S Q are positive defiite, ad the expected momet fuctio gradiets G P (θ) ad G Q (θ, ψ) have full rak for all θ ad ψ. Assumptio A5 (Idetificatio Coditio) The true baselie parameter vector θ 0 is idetified by the baselie momet coditios i the sese that E g P (θ; x)] = 0 oly if θ = θ 0. Ad, the true parameters (θ 0, ψ 0 ) of the full model is idetified by the momet coditios i the sese that E g Q (θ, ψ; x, y)] = 0 oly if θ = θ 0 ad ψ = ψ 0. Remark. Cosider the discussio of limited-iformatio likelihood i Sectio 1. The cotiuous differetiability of momet fuctios, together with the idetificatio coditio, imply that the parametric family of limited-iformatio distributios P θ ad Q γ, as well as the momet coditios, are soud: the covergece of a sequece of parameter values is equivalet to the weak covergece of the distributios: θ θ 0 P θ P θ0 E l (dp θ /dp)] E l (dp θ0 /dp)] = 0, ad (47) γ γ 0 Q γ Q γ0 E l (dq γ /dq)] E l (dq γ0 /dq)] = 0, (48) where γ = (θ, ψ) ad γ 0 = (θ 0, ψ 0 ). Ad, the covergece of a sequece of parameter values is equivalet to the covergece of the momet coditios: γ γ 0 E g Q (γ; x, y)] E g Q (γ 0 ; x, y)] = 0. (49) 16
Assumptio A6 (Regular Bayesia Coditio) Suppose the parameter set is Θ Ψ R D Θ+D Ψ with Θ ad Ψ beig compact. Ad, the prior is absolutely cotiuous with respect to the Lebesgue measure with Rado-Nykodim desity π(θ, ψ), which is twice cotiuously differetiable ad positive. Deote π max θ Θ,ψ Ψ π(θ, ψ) ad π mi θ Θ,ψ Ψ π(θ, ψ). The probability measure defied by the limited-iformatio posterior desity π Q (θ x, y ) is domiated by the probability measure defied by the baselie limited-iformatio posterior desity π P (θ x ), for almost every x, y uder Q 0. Remark. Compactess implies total boudess. I our diaster risk model, the parameter set for the prior is ot compact due to the adoptio of uiformative prior. However, i that umerical example, we ca trucate the parameter set at very large values which will ot affect the mai umerical results. Remark. The cocept of domiatig measure here is the oe i measure theory. More precisely, this regularity coditio requires that for ay measurable set which has zero measure uder π Q (θ x, y ), it must also have zero measure uder π P (θ x ). This assumptio is just to guaratee that D KL (π Q (θ x, y ) π P (θ x )) to be well defied. Assumptio A7 (Regular Feature Fuctio Coditio) The feature fuctio f = (f 1,, f Df ) : Θ R D f is a twice cotiuously differetiable vectorvalued fuctio. We assume that there exist D Θ D f twice cotiuously differetiable fuctios f Df +1,, f DΘ o Θ such that F = (f 1, f,, f DΘ ) : Θ R D Θ is a oe-to-oe mappig (i.e. ijectio) ad F (Θ) is a coected ad compact D Θ -dimesioal subset of R D Θ. Remark. A simple sufficiet coditio for the regular feature fuctio coditio to hold is that each fuctio f i (i = 1,, D f ) is a proper ad twice cotiuously differetiable fuctio o R D Θ f(θ) ad (θ (1),, θ (Df )) > 0 at each θ RD Θ. I this case, we ca simply choose f k (θ) θ (k) for k = D f + 1,, D Θ. The, the Jacobia determiat of F is ozero at each θ R D Θ ad F is proper ad twice differetiable mappig R D Θ R D Θ. Accordig to the Hadamard s Global Iverse Fuctio Theorem (e.g. Kratz ad Parks, 013), F is a oe-to-oe mappig ad F (Θ) is a coected ad compact D Θ -dimesioal subset of R D Θ. Assumptio A8 (Expoetial Coditio) For sufficietly small δ > 0, E sup γ Ω(γ,δ) exp v T g Q (γ; x, y) ] <, for all vectors v i a eighborhood of the origi. Remark. This assumptio is eeded to guaratee that the limited-iformatio likelihood Q γ i (16) is well-defied ad the related Fechel duality holds. While this assumptio is stroger tha the momet existece assumptio i Hase (198), it is commoly adopted i the literature ivolvig Kullback-Leibler divergece (see, e.g. Csiszár, 1975; Kitamura ad Stutzer, 1997) ad geeral expoetial-family models (see, e.g. Berk, 197; Dou, Pollard, ad Zhou, 01). 17
1.5 Lemmas As emphasized by White ad Domowitz (1984), the mixig coditios serve as a operatig assumptio for ecoomic ad fiacial processes, because mixig assumptios are difficult to verify or test. However, White ad Domowitz (1984) argues that the restrictio is ot so restrictive i the sese that a wide class of trasformatios of mixig processes are themselves mixig. Lemma 1. Let z t = Z ((x t+τ0, y t+τ0 ),, (x t+τ, y t+τ )) where Z is a measurable fuctio oto R Dz ad two itegers τ 0 < τ. If x t, y t is uiform mixig with uiform mixig coefficiets φ(m) Cm λ for some λ > 0 ad C > 0, the z t is uiform mixig with uiform mixig coefficiets φ z (m) C z m λ for some C z > 0. Proof. It ca be derived directly from the defiitio of uiform mixig. The followig classical result is put here for easy referece. Ad, it easily leads to a corollary which will be used repeatedly. Lemma. For ay two σ fields F 1 ad F, it holds that ρ max (F 1, F ) φ(f 1, F )] 1/. Proof. The proof ca be foud i Ibragimov (196) or Doob (1950, Lemma 7.1). Corollary. Let z t be strictly statioary process satisfyig uiform mixig such that φ(m) Cm λ for some λ > 0 ad C > 0, the the autocorrelatio fuctio where z (i),t is the i-th elemet of z t. max Corr(z (i),t, z (j),t+m ) Cm λ/ i,j Proof. It directly follows from Lemma 1 ad Lemma. Lemma 3. Let z t be a sequece of strictly statioary radom vectors such that E P0 z t < + ad it satisfies the uiform mixig coditio with φ(m) Cm λ for some λ > ad C > 0. The, lim var 0 ( 1 z t) = V0 < +. Proof. Let z (i),t be the i-th elemet of vector z t. Deote σ i var 0(z (i),t ) for each i. Ad, we deote the cross correlatio to be ρ i,j (τ) Corr(z (i),t, z (j),t+τ ) for all t, τ, i ad j. The, we have, for each pair of i ad j, Cov 0 ( 1 z (i),t, 1 ) z (j),t = σ i σ j ρ i,j (0) + 1 ρ i,j(1) + + 1 ] ρ i,j( 1). Accordig to Corollary, we kow that ρ i,j (m) = o(m 1 ). Thus, by verifyig the Cauchy coditio, we kow that ρ i,j (0) + 1 ρ i,j(1) + + 1 ρ i,j( 1) coverges to a fiite costat. 18
The followig two lemmas are extesios of Propositios 6.1-6. i Clarke ad Barro (1990, Page 468-470). Lemma 4 shows that aalogs of the soudess coditio for certai metrics o probability measures imply the existece of strogly uiformly expoetially cosistet (SUEC) hypothesis tests. A composite hypothesis test is called uiformly expoetially cosistecy (UEC) if its type-i ad type-ii errors are uiformly upper bouded by e ξ for some positive ξ, over all alteratively (see e.g. Barro, 1989). A strogly uiformly expoetially cosistet (SUEC) test is a hypothesis test whose type-i ad type-ii errors are upper bouded by e ξ for some positive costat ξ, uiformly over all alteratives ad all ull parametric models over two subsets i the probability measure space. Lemma 5 shows that metrics with the desirable cosistecy property exists, which exteds Propositio 6. i Clarke ad Barro (1990). Lemma 4. Suppose d G is a metric o the space of probability measures o X with the property that for ay ɛ > 0, there exists ξ > 0 ad C > 0 such that P d G ( P, P) > ɛ Ce ξ, (50) uiformly over all probability measures P, where P is the empirical distributio. Ad, the metric d G also satisfies d G (P θ, P θ ) 0 θ θ. (51) The, for ay δ > δ 1 0 ad for each θ Ω(θ 0, δ 1 ), there exists a SUEC hypothesis test of θ Ω(θ 0, δ 1 ) versus alterative Ω(θ 0, δ). Proof. The proof is a extesio based o that of Lemma 6.1 i Clarke ad Barro (1990). From (51), for ay give δ > δ 1 0, there exists ɛ 1 > 0 such that d L (θ, Ω(θ 0, δ) ) > δ δ 1 > 0 implies that d G (P θ, P θ ) > ɛ 1 for all θ Ω(θ 0, δ). Thus, for ay δ > δ 1 0, there exists ɛ 1 > 0 such that, d G (P θ, P θ ) > ɛ 1 for all θ Ω(θ 0, δ 1 ) ad θ Ω(θ 0, δ). Therefore, for each θ Ω(θ 0, δ 1 ), if we have a SUEC test of H 0 : P = P θ versus H A : P P : d G ( P, P θ ) > ɛ 1, the we have a SUEC test of H 0 : θ = θ versus H A : P P θ : θ θ > δ δ 1. Let P be the empirical distributio. We choose ɛ = ɛ 1 / ad let A θ, x : d G ( P, P θ ) ɛ be the acceptace regio. By the coditio (50), we have that the probability of type-i error satisfies, for some ξ > 0 ad C > 0, P θ, A θ, Ce ξ, uiformly over θ Ω(θ 0, δ 1 ). 19
We wat to show that the probability of type-ii error P θ A θ, is expoetially small uiformly over θ Ω(θ 0, δ) ad θ Ω(θ 0, δ 1 ). More precisely, usig triagular iequality, we kow that for ay θ Ω(θ 0, δ 1 ) ad θ Ω(θ 0, δ) ɛ d G (P θ, P θ ) d G (P θ, P ) + d G ( P, P θ ). (5) By defiitio ad iequality (5), for each θ Ω(θ 0, δ 1 ), o the acceptace regio A θ,, we have ɛ < ɛ + d G ( P, P θ ) for ay θ Ω(θ 0, δ). (53) Thus, for each θ Ω(θ 0, δ 1 ), o the acceptace regio A θ,, it holds that d G ( P, P θ ) > ɛ, for ay θ Ω(θ 0, δ). Therefore, for each θ Ω(θ 0, δ 1 ) ad each θ Ω(θ 0, δ), we have P θ,a θ, P θ, d G ( P, P θ ) > ɛ Ce ξ. Lemma 5. For a space of probability measures deoted by P(Ψ, λ), o a separable metric space X such that the mixig coefficiet φ(m) Ψm λ with Ψ > 0 ad λ beig uiversal costats, there exists a metric d G (P, Q) for probability measure P ad Q i P(Ψ, λ) that satisfies the property (50) ad such that covergece i d G implies the weak covergece of the measures. Thus, i particular, for probability measures i a parametric family d G (P θ, P θ ) 0 P θ P θ. (54) Proof. We exted the proof for the existece result i Propositio 6. i Clarke ad Barro (1990) to allow for weak depedece. Let F i : i = 1,, be the coutable field of sets geerated by balls of the form x : d X (x, s j1 ) 1/j for j 1, j = 1,,, where d X deotes the metric for the space X ad s 1, s, is a coutable dese sequece of poits i X. Defie a metric o the space of probability measures as follows d G (P, Q) = i P F i QF i. i=1 Accordig to Gray (1988, Page 51 53), if d G ( P, P) 0, the P coverges weakly to P. Now, for ay ɛ > 0, we choose k 1 l (ɛ) / l(). Thus, d G ( P, P) k i P F i PF i + i max P F i PF i + ɛ/. i=1 i=k+1 1 i k 0
The, we have P d G ( P, P) > ɛ P max 1 i k P F i PF i > ɛ/ k i=1 P P F i PF i > ɛ/ The Hoeffdig-type iequality for uiform mixig process (see, e.g. Roussas, 1996, Theorem 4.1) guaratees that there exists C 1 > 0 ad ξ > 0 such that sup P P F i PF i > ɛ/ C 1 e ξ. P P(Ψ,λ) Thus, with C = kc 1. P d G ( P, P) > ɛ Ce ξ, (55) Lemma 6 exteds the large deviatio result of Schwartz (1965, Lemma 6.1) to allow time depedece i the data process. Let z = (z 1,, z ) be strictly statioary ad uiform mixig with φ(m) = O ( m λ) for some positive λ. The process z have the joit desity P θ,. Deote the mixture distributio of the parametric family P θ, with respect to the coditioal prior distributio π P ( N ) to be P N, with desity π P(z N ). More precisely, we defie π P (z N ) π P (z θ)π P (θ N )dθ. (56) N Lemma 6. Assume that the mixig coefficiet power λ >. Suppose there exist strogly uiformly expoetially cosistet (SUEC) tests of hypothesis θ N 0 agaist the alterative θ N such that N 0 N with d L (N 0, N ) δ for some δ > 0. The, there exists ξ > 0 ad a positive iteger k such that, for all θ N 0, P θ, P N, (1 e ξm ), where m + k. T V Proof. We assume that there exists a sequece of SUEC tests, deoted by A, for ay sample with size. The, there exists a positive iteger k such that for all k P θ, A < 1 8 for all θ N 0 ad (57) For each j = 1,,, we defie P θ,a > 1 1 8 for all θ N. (58) A k,j = A k (z j+1,, z j+k ), (59) 1
the, accordig to Lemma 1, Y m = 1 m A k,j (60) m j=1 is a average of strictly statioary ad uiform mixig such that φ(m) φ m λ. The expectatio of Y m, uder distributio P θ,, is µ(θ) with µ(θ) = < 1 8 if θ N 0 > 7 8 if θ N. By Corollary ad the uiform mixig coditios with λ >, together with the fact that A k,j 0, 1], we kow that the assumptios of Theorem.4 i White ad Domowitz (1984) holds ad hece the CLT for the time series holds, i.e. (61) m 1/ Y m d N(µ(θ), V (θ)), (6) where V (θ) = lim m + P θ, m 1/ m j=1 (A k,j µ(θ))]. From Lemma 3, we kow that V (θ) V <. Therefore, the momet geeratig fuctios coverge m 1 l P θ, e tmym 1 t V (θ). (63) O the oe had, whe θ N, we have µ(θ) > 1 4, accordig to Theorem 8.1.1 of Taiguchi ad Kakizawa (000), we ca achieve the followig large deviatio result Therefore, there exists ξ 1 > 0 such that lim m + m 1 l P θ, Y m 1 = 1 4 3V (θ) 1 3V. (64) P θ, Y m 1 e ξ1m for all θ N. (65) 4 Thus, P N, Y m 1 = P θ, Y m 1 π P (θ N )dθ e ξ1m, for m + k. (66) 4 N 4 O the other had, whe θ N 0, we have µ(θ) < 1 4, accordig to Theorem 8.1.1 of Taiguchi ad Kakizawa (000), we obtai the large deviatio result Therefore, there exists ξ > 0 such that lim m + m 1 l P θ, Y m 1 = 1 4 3V (θ) 1 3V. (67) P θ, Y m 1 e ξm. (68) 4
For ξ = mi(ξ 1, ξ ), it follows that P θ, P N, = sup P θ, A P N, A (1 e ξm ) (69) T V A F for m + k by cosiderig A = Y m 1 4. We itroduce Le Cam s theory o hypothesis testig (see, e.g. Le Cam ad Yag, 000, Chapter 8). Lemma 7. Uder the regularity coditios i Subsectio 1.4, there are test fuctios A ad positive coefficiets C, ξ, ɛ ad K such that P 0, (1 A ) 0 ad P θ, A Ce ξ θ θ 0 / for all θ such that K/ θ θ 0 ɛ. Proof. For all z R K(Dx+Dy), defie the rectagular F z (, z 1 ] (, z ] (, z K(Dx+Dy)]. The empirical process is defied as Π (z) P F z = 1 1 z t F z. We defie Π θ (z) P θ F z. Accordig to Le Cam ad Yag (000, Page 50), there exists positive costats c ad ɛ such that sup x Π θ0 (x) Π θ (x) > c θ θ 0 for θ θ 0 ɛ. Deote the expectatio to be µ (θ) E Pθ sup x Π θ0 (x) Π θ (x). By the classical result of weak covergece for the Kolmogorov- Smirov statistic sup x Π θ0 (x) Π θ (x), we kow that there exists a large costat M such that µ (θ) M c for all θ θ 0 ɛ. We choose K 4M c. Cosider the test fuctios A = sup z Π θ0 (z) Π θ (z) < K/. Usig triagular iequality, we obtai P θ, A P θ, c θ θ 0 sup z Π (z) Π θ (z) µ (θ) Usig Dvoretzky-Kiefer-Wolfowitz type iequality for uiform mixig variables i Samso (000, Theorem 3), we ca show there exists ξ > 0 such that P θ, c θ θ 0 sup z for all K/ θ θ 0 ɛ. Thus, P θ, A e ξ θ θ 0 / Π (z) Π θ (z) µ (θ) e ξ θ θ 0 / same iequality, it is straightforward to get P 0, (1 A ) 0. for all K/ θ θ 0 ɛ. By the 1.6 Basic Properties of Limited-Iformatio Likelihoods The MLE for Limited-Iformatio Likelihood Q γ, is defied as L (γ) 1 The empirical log-likelihood fuctio l π Q (x t, y t ; γ), γ. (70) The maximum likelihood estimator ˆθ Q is defied as follows: ] ˆγ ML Q argmax L (γ) = argmax η Q (γ) T 1 g Q (γ; x t, y t ) A Q (γ). (71) γ γ 3
We shall show that the MLE ˆγ ML Q is cosistet ad asymptotic ormal with asymptotic efficiet variace-covariace matrix. This is importat to justify the use of our limited-iformatio likelihood Q γ of (16) i the Bayesia paradigm as a valid likelihood. The first-order coditio is 0 = L (ˆγ ML Q ) which is ] T 0 = η Q (ˆγ ML) Q 1 ] g Q (ˆγ ML; Q x t, y t ) + 1 ] T g Q (ˆγ ML; Q x t, y t ) η Q (ˆγ ML) Q A Q (ˆγ ML). Q The defiitio of ˆγ ML Q i (71) ad the properties of expected log-likelihood L(γ) plays a vital role i establishig the cosistecy of ˆγ ML Q. The first-order coditio (7) is crucial for obtaiig the asymptotic ormality of ˆγ ML Q. Now, let s cosider the expected log-likelihood of Q γ : (7) L(γ) η Q (γ) T g Q (γ) A Q (γ). (73) where g Q (γ) E g Q (γ; x, y)]. It holds that L(γ 0 ) = 0 sice η Q (γ 0 ) = 0 ad A Q (γ 0 ) = 0. Further, it holds that L(γ 0 ) = 0 sice L(γ 0 ) = η Q (γ 0 )] T g Q (γ 0 ) + g Q (γ 0 )] T η Q (γ 0 ) A Q (γ 0 ), ad g Q (γ 0 ) = 0 ad A Q (γ 0 ) = 0. The Jacobia matrix of L(γ) evaluated at γ 0 is equal to the egative Fisher iformatio matrix I Q (γ 0 ): L(γ 0 ) = A Q (γ 0 ) = η Q (γ 0 ) T g Q (γ 0 ) = G T Q S 1 Q G Q = I Q (γ 0 ). (74) Here the first equality is simply implied by the formula of L(γ) from (73). The secod equality of (74) is due to ] A Q (γ)e AQ(γ) = E η Q (γ) T g Q (γ; x, y)e η Q(γ) T g Q (γ;x,y) γ, (75) which is implied by the defiitio of A Q (γ) ad the coditio (17). The third equality of (74) is due to the relatio g Q (γ 0 ) = G Q ad the followig relatio implied by (17): η Q (γ 0 ) = S 1 Q G Q. (76) Ad, the fourth equality of (74) is simply due to the defiitio of I Q (γ 0 ). Aother importat property of expected log-likelihood of Q γ is the idetificatio of γ 0 i the followig sese: L(γ) L(γ 0 ) = 0, γ, (77) where the iequality is due to the Jese s iequality applied to A Q (γ). The Jese s iequality holds with equality if ad oly if η Q (γ) T g Q (γ; x, y) is costat almost surely uder Q which is true oly whe γ = γ 0. 4
Cosistecy We first prove the cosistecy of the estimator ˆγ ML Q. To our kowledge, the theoretical result for the MLE of limited-iformatio likelihood Q γ based o the priciple of miimum Kullback-Leibler divergece is ew, though we appeal to the stadard approach of Wald (1949) ad Wolfowitz (1949). Propositio 1. Uder Assumptios A1 - A5 ad A8 - A9, ˆγ Q ML coverges to γ 0 i probability. Proof. Our goal is to show that for ay ɛ > 0 there exists N such that Q ˆγ ML Q Ω(γ 0, ɛ) < ɛ for N, (78) where Ω(γ 0, ɛ) γ : γ γ 0 > ɛ is the ope ball cetered at γ 0 with radius ɛ. For all ɛ > 0, it holds that (due to the defiitio of ˆγ ML Q i (71)) Moreover, for all ɛ > 0 ad h > 0, it holds that 1ˆγ ML Q Ω(γ 0, ɛ) 1 sup L (γ) L (γ 0 ). (79) γ Ω(γ 0,ɛ) 1 sup L (γ) L (γ 0 ) = 1 L (γ 0 ) < h1 sup L (γ) L (γ 0 ) γ Ω(γ 0,ɛ) γ Ω(γ 0,ɛ) + 1 L (γ 0 ) h1 sup γ Ω(γ 0,ɛ) L (γ) L (γ 0 ) 1 L (γ 0 ) < h + 1 sup L (γ) h. (80) γ Ω(γ 0,ɛ) Combiig (78) ad (80), it suffices to show that for all ɛ > 0 there exists h > 0 ad N such that for N max Q L (γ 0 ) < h, Q sup L γ Ω(γ0,ɛ) (γ) h < ɛ/. (81) The first probabilistic boud i (81) is straightforward due to the LLN result wlim L (γ 0 ) = L(γ 0 ) = 0. (8) Now, we cosider the secod probabilistic boud i (81). From the domiace coditio (Assumptio A3) ad the remark of idetificatio coditio (Assumptio A5), we kow that lim sup δ 0 γ Ω(γ,δ) Thus, we have the followig Domiace Covergece result: l π Q (x, y; γ ) = l π Q (x, y; γ) a.s. Q for all γ. (83) lim E δ 0 sup l π Q (x, y; γ ) γ Ω(γ,δ) ] = L(γ). (84) 5