SUPPLEMENT TO ROBUSTNESS, INFINITESIMAL NEIGHBORHOODS, AND MOMENT RESTRICTIONS (Econometrica, Vol. 81, No. 3, May 2013, )

Ecoometrica Supplemetary Material SUPPLEMENT TO ROBUSTNESS, INFINITESIMAL NEIGHBORHOODS, AND MOMENT RESTRICTIONS (Ecoometrica, Vol. 81, No. 3, May 213, 1185 121) BY YUICHI KITAMURA,TAISUKE OTSU, ANDKIRILL EVDOKIMOV A.1. PROOFS OF MAIN RESULTS THIS SUPPLEMENT PRESENTS the proofs of some of the results preseted i the previous sectios. A.1.1. Proof of Lemma 2.1 We first show the claim for α< 1, that is, 2 (A.1) (1 α)i α (P Q) 1 2 I (P Q) Let H α (x) = 1 (1 α xα ) 2(1 x ), x ; the the above iequality becomes ( ) p (A.2) H α qdν q Note that { d > if x>1, dx H α(x) = x α 1 + x = if x = 1, < if x<1. The above holds for the case with α = as well, sice H (x) = log x 2(1 x ).Moreover,H α (1) =. Therefore H α (x) forallx, ad the desired iequality (A.2) follows immediately. Next, we prove the case with α> 1, that 2 is, αi α (P Q) 1 2 I (P Q) Let β = 1 α< 1 ; the the above iequality becomes 2 (A.3) (1 β)i 1 β (P Q) 1 2 I (P Q) By (A.1) ad the symmetry of the Helliger distace, (1 β)i β (Q P) 1 2 I (Q P) = 1 2 I (P Q) 213 The Ecoometric Society DOI: 1.3982/ECTA8617

2 Y. KITAMURA, T. OTSU, AND K. EVDOKIMOV But the equality I 1 β (P Q) = I β (Q P) holds for every β R, ad(a.3) follows. NOTATION: LetC be a geeric positive costat, be the L 2 -metric, θ = θ + t/ T Q = T(Q ) T P = T(P ) 1 P θ Q = arg mi H(P Q) R (Q θ γ) = P P θ (1 + γ g (x θ)) dq g (x θ) = g(x θ)i{x X } Λ = G Ω 1 g (x θ ) Λ = G Ω 1 g(x θ ) ( ) 1 ψ Q = 2 Λ Λ dq { } Λ dq d P θ Q dq A.1.2.1. Proof of (i) A.1.2. Proof of Theorem 3.1 Pick arbitrary r>adt R p. Cosider the followig parametric submodel havig the likelihood ratio (A.4) dp θ ζ dp = 1 + ζ g (x θ ) = f(x θ ζ ) (1 + ζ g (x θ )) dp where ζ = E P g(x θ )g (x θ ) ] 1 E P g(x θ ) ] Note that P θ = P, P θ ζ P θ (by the defiitio of ζ ), ad ζ = O( ) (by the proof of Lemma A.4(i)). Also, sice x X ζ g (x θ ) =O( m ) = o(1), the likelihood ratio dp θ ζ is well defied for all large eough. So, for dp this submodel the mappig T a must satisfy (3.1). We ow evaluate the Helliger distace betwee P θ ζ ad P. A expasio aroud ζ = yields H(P θ ζ P ) = f (x θ ζ ) ζ ζ dp ζ = + 1 2 f(x θ ζ ) 2 ζ ζ ζ ζ dp ζ = ζ

ROBUSTNESS AND MOMENT RESTRICTIONS 3 where ζ is a poit o the lie joiig ζ ad, ad f (x θ ζ ) ζ = 1 { g (x θ ) E P g (x θ ) ]} ζ = 2 2 f(x θ ζ ) ζ ζ = 1 ( 1 + ζ 4 g (x θ ) ) 3/2( 1 + ζ E P g (x θ ) ]) g (x θ )g (x θ ) 1 2 ( 1 + ζ g (x θ ) ) ( 1 + ζ E P g (x θ ) ]) 3/2 g (x θ )E P g (x θ ) ] + 3 4 ( 1 + ζ g (x θ ) ) ( 1 + ζ E P g (x θ ) ]) 5/2 E P g (x θ ) ] E P g (x θ ) ] Thus, a legthy but straightforward calculatio combied with Lemma A.4, ζ = O( ), ad x X ζ g (x θ ) =o(1) implies (A.5) H(P θ ζ P ) 2 = 1 2 ζ 1 4 t Σ 1 t ( g (x θ ) E P g (x θ ) ]) 2 dp + o(1) Based o this limit, a lower boud of the maximum bias of T a is obtaied as (see, Rieder (1994, eq. (56), p. 18)) lim if Q B H (P r/ ) lim if ( τ T a (Q) τ(θ ) ) 2 {t R p :P θ ζ B H (P r/ )} max {t R p :(1/4)t Σt r 2 ε} = 4 ( r 2 ε )( τ(θ ) θ (( ) 2 τ(θ ) t) θ ) ( Σ 1 τ(θ ) ( τ T a (P θ ζ ) τ(θ ) ) 2 for each ε ( r 2 ), where the first iequality follows from the set iclusio relatioship, the secod iequality follows from (3.1) ad (A.5), ad the equality θ )

4 Y. KITAMURA, T. OTSU, AND K. EVDOKIMOV follows from the Kuh Tucker theorem. Sice ε ca be arbitrarily small, we obtai the coclusio. A.1.2.2. Proof of (ii) Pick arbitrary r> ad sequece Q B H (P r/ ). We first show the Fisher cosistecy of T. From Lemma A.2 (ote: P θ ζ B H (P r/ ) for all large eough), ( ) T(P θ ζ ) θ = Σ 1 Λ dp θ ζ + o(1) = Σ 1 G Ω 1 g(x θ)/ θ dp θ ζ t + o(1) t for all large eough, where θ is a poit o the lie joiig θ ad θ, the secod equality follows from g(x θ )I{x / X } dp θ ζ = o( ) (by a argumet similar to (A.16)), g(x θ )dp θ ζ = (by P θ ζ P θ ), ad a expasio aroud θ = θ, ad the covergece follows from the last statemet of Lemma A.4(i). Therefore, T is Fisher cosistet. We ext show (3.1). A expasio of τ T Q aroud T Q = θ, Lemmas A.1(ii) ad A.2, ad Assumptio 3.1(viii) imply ( τ T Q τ(θ ) ) = ) τ(θ ) ( Σ 1 Λ dq + o(1) θ = { } ν Λ dq dp dq { } ν Λ dp dq dp + o(1) where we deote ν = ( τ(θ ) ) Σ 1. From the triagle iequality, θ ( τ T Q τ(θ ) ) 2 { { } ν Λ dq dp dq + ν + 2 ν 2 { } Λ dq dp dp { } Λ dq dp dq 2

ROBUSTNESS AND MOMENT RESTRICTIONS 5 } { } ν Λ dq dp dp + o(1) = {A 1 + A 2 + 2A 3 }+o(1) For A 1, observe that {dq A 1 ν Λ Λ dq } ν dp 2 B r2 + o( 1) where the first iequality follows from the Cauchy Schwarz iequality, ad the secod iequality follows from Lemma A.5(i) ad Q B H (P r/ ). Similarly, we have A 2 B r2 + o( 1 ) ad A 3 B r2 + o( 1 ). Combiig these terms, (A.6) lim ( τ T Q τ(θ ) )2 4r 2 B for ay sequece Q B H (P r/ ) ad r>. Pick ay r>. Sice the remum Q BH (P r/ ) (τ T(Q) τ(θ )) 2 is fiite for all large eough (from Lemma A.1(i)), there exists a sequece Q B H(P r/ ) such that lim ( τ T Q τ(θ ) ) 2 = lim ( τ T(Q) τ(θ ) )2 Q B H (P r/ ) Therefore, the coclusio follows by (A.6). A.1.3.1. Proof of (i) A.1.3. Proof of Theorem 3.2 Pick arbitrary ε ( r 2 ) ad r>. Cosider the parametric submodel P θ ζ defied i (A.4). The covolutio theorem (Theorem 25.2 of va der Vaart (1998)) implies that, for each t R p, there exists a probability measure M that does ot deped o t ad satisfies (A.7) ( τ Ta (P ) τ T a (P θ ζ ) ) d M N ( B ) uder P θ ζ Let (( ) 2 t τ(θ ) = arg max t) {t R p :(1/4)t Σt r 2 ε} θ ( ) τ(θ ) s.t. t ξdm N ( B ) θ

6 Y. KITAMURA, T. OTSU, AND K. EVDOKIMOV Sice the itegral ξdm N( B ) does ot deped o t, sucht always exists. From 1 4 t Σt r 2 ε ad (A.5), it holds that P θ +t / ζ B H (P r/ ) for all large eough. Also, ote that E Pθ ζ g(x θ) η ] < for all large eough (by x X ζ g (x θ ) =o(1) ad Assumptio 3.1(v)). Thus, P θ +t / ζ B H (P r/ ) for all large eough, ad we have lim b lim if = lim lim if b = Q B H (P r/ ) lim lim if b b ( τ T a (P ) τ(θ ) )2 dq b ( τ T a (P ) τ(θ ) )2 dp θ +t / ζ ( b ξ + ( ) ) 2 τ(θ ) t dm N ( B ) θ ξ 2 dm N ( B ) (( ) ) 2 τ(θ ) + t θ ( ) τ(θ ) + 2 t ξdm N ( B ) θ { 1 + 4 ( r 2 ε )} B where the first equality follows from the Fisher cosistecy of T a,(a.9), ad the cotiuous mappig theorem, the secod equality follows from the mootoe covergece theorem, ad the secod iequality follows from the defiitio of t.siceε ca be arbitrarily small, we obtai the coclusio. A.1.3.2. Proof of (ii) Pick arbitrary r>adb>. Applyig the iequality b (c 1 + c 2 ) b c 1 + b c 2 for ay c 1 c 2, (A.8) lim b ( τ T(P ) τ(θ ) )2 dq Q B H (P r/ ) lim + 2 lim Q B H (P r/ ) Q B H (P r/ ) b ( τ T(P ) τ T(P ) )2 dq b { τ T(P ) τ T(P ) τ T(P ) τ(θ ) } dq + lim b ( τ T(P ) τ(θ ) )2 dq Q B H (P r/ ) = A 1 + 2A 2 + A 3

ROBUSTNESS AND MOMENT RESTRICTIONS 7 For A 1, (A.9) A 1 b lim b lim b lim Q B H (P r/ ) Q B H (P r/ ) Q B H (P r/ ) (x 1 x )/ X i=1 m η E Q x i / X dq dq g(x θ) η] = where the first iequality follows from T(P ) = T(P ) for all (x 1 x ) X, the secod iequality follows from a set iclusio relatio, the third iequality follows from the Markov iequality, ad the equality follows from Assumptio 3.1(vii) ad E Q g(x θ) η ] < for all Q B H (P r/ ). Similarly, we have A 2 =. We ow cosider A 3. Note that the mappig f b (Q) = b (τ T(P ) τ(θ )) 2 dq is cotiuous i Q B H (P r/ ) uder the Helliger distace for each, ad the set B H (P r/ ) (ot B H (P r/ )) is compact uder the Helliger distace for each. Thus, there exists Q b B H (P r/ ) such that Q BH (P r/ f ) (Q) = f ( Q b ) for each. Thewehave A 3 lim b ( τ T(P ) τ(θ ) )2 dq Q B H (P r/ ) = lim b ( τ T(P ) τ(θ ) )2 d = b (ξ + t b ) 2 dn ( B ) Q b B + t 2 b ( 1 + 4r 2) B where t b = lim (τ T( Q b ) τ(θ )), the first iequality follows from B H (P r/ ) B H (P r/ ), the secod equality follows from Lemma A.8 (with Q = Q b ) ad the cotiuous mappig theorem, the secod iequality follows from b c c ad a direct calculatio, ad the last iequality follows from Theorem 3.1(ii). Combiig these results, the coclusio is obtaied. A.1.4. Proof of Theorem 3.3 A.1.4.1. Proof of (i) Cosider the parametric submodel P θ ζ defied i (A.4). Sice l is uiformly cotiuous o R p (by Assumptio 3.2) ad T a is Fisher cosis-

8 Y. KITAMURA, T. OTSU, AND K. EVDOKIMOV tet, b l ( { S τ T a (P θ ζ ) }) { b l( S τ(θ ) } ( ) τ(θ ) t) θ uiformly i t, t <cad {S } N for each c>adb>. Thus, (A.1) b l ( { S τ T a (P θ ζ ) }) dp θ ζ if S S t c = if R R t c ( b l R ( τ(θ ) θ ) t) dp θ ζ + o(1) for each c>, where R = {S τ(θ )} is a stadardized estimator ad R ={ {S τ(θ )} : S S}. By expadig the log likelihood ratio log dp θ ζ dp aroud ζ =, log dp θ ζ dp = ζ { g (x i θ ) E P g (x θ ) ]} i=1 ζ g (x i θ )g (x i θ )ζ i=1 2(1 + ζ g (x i θ )) 2 + ζ E P g (x θ )]E P g (x θ )] ζ ( ) 2 2 1 + ζ g (x θ ) = L 1 L 2 + L 3 where ζ ad ζ are poits o the lie joiig ζ ad. For L 1, a expasio of g (x θ ) (i ζ )aroudθ = θ combied with Lemma A.4(i) implies that, uder P, L 1 = t G Ω 1 1 { g (x i θ ) E P g (x θ ) ]} + o p (1) i=1 Also, Lemma A.4(i) ad x X ζ g (x θ ) =o(1) imply that, uder P, L 2 p 1 2 t Σt L 3 Therefore, i the termiology of Rieder (1994, Defiitio 2.2.9), the parametric model P θ ζ is asymptotically ormal with the asymptotic sufficiet statistic

ROBUSTNESS AND MOMENT RESTRICTIONS 9 G Ω 1 1 {g i=1 (x i θ ) E P g (x θ )]} ad the asymptotic covariace matrix Σ. Note that this is essetially the LAN (local asymptotic ormality) coditio itroduced by LeCam. If P θ ζ is asymptotically ormal i this sese, we ca directly apply the result of the miimax risk boud by Rieder (1994, Theorem 3.3.8(a)), that is, (A.11) lim lim b c lim if if S S ldn ( B ) t c ( b l R ( τ(θ ) θ ) t) dp θ ζ (see also Theorem 1 i LeCam ad Yag (199)). From (A.1)ad(A.11), lim lim b c lim if if S S ldn ( B ) t c b l ( { S τ T a (P θ ζ ) }) dp θ ζ Fially, sice E Pθ ζ g(x θ) η ] < for all large eough (by x X ζ g (x θ ) =o(1) ad Assumptio 3.1(v)), we have P θ ζ B H (P r/ ) for all t satisfyig 1 4 t Σt r 2 ε with ay ε ( r 2 ) ad all large eough. Therefore, the set iclusio relatio yields lim lim lim if if b l ( { S τ T a (Q) }) dq b r S S Q B H (P r/ ) lim lim b c lim if which implies the coclusio. if S S t c b l ( { S τ T a (P θ ζ ) }) dp θ ζ A.1.4.2. Proof of (ii) Pick arbitrary r>adb>. Sice T(P ) = T(P ) for all (x 1 x ) X, (A.12) lim b l ( { τ T(P ) τ T(Q) }) dq Q B H (P r/ ) lim b l ( { τ T(P ) Q B H (P r/ ) τ T(Q) }) dq (x 1 x )/ X

1 Y. KITAMURA, T. OTSU, AND K. EVDOKIMOV + lim b l ( { τ T(P ) Q B H (P r/ ) τ T(Q) }) dq (x 1 x ) X A argumet similar to (A.9) implies that the first term of (A.12) iszero. From X X ad B H (P r/ ) B H (P r/ ), the secod term of (A.12) is bouded from above by lim b l ( { τ T(P ) τ T(Q) }) dq Q B H (P r/ ) = b ldn ( B ) where the equality follows from Lemma A.8, the uiform cotiuity of l over R p, ad compactess of B H (P r/ ) uder the Helliger distace. Let b ad the coclusio follows. A.2. AUXILIARY LEMMAS LEMMA A.1: Suppose that Assumptio 3.1 holds. The (i) for each r>, T(Q) exists for all Q B H (P r/ ) ad all large eough, (ii) T Q θ as for each r>ad sequece Q B H (P r/ ). PROOF: Proof of (i). The proof is split ito several steps. Let G(θ Q) be the covex hull of the port of g(x θ) uder x Q. I the first step, we show that it G(θ P ).If/ G(θ P ), the we have E P g(x θ )], which is a cotradictio. Thus, it is eough to show that is ot o the boudary of G(θ P ). Suppose is ideed o the boudary of G(θ P ). I this case, we have two cases: (a) there exists a costat m-vector a such that a g forallg G(θ P ) ad P {g G(θ P ) : a g>} >, or (b) there exists a such that a g = forallg G(θ P ). For the case (a), we have a E P g(x θ )] >, which cotradicts with E P g(x θ )]=. For the case (b), we have a E P g(x θ )g(x θ ) ]a =, which cotradicts with Assumptio 3.1(vi). I the secod step, we show that, for each r>, there exists δ> such that it G(θ Q) for all θ θ δ ad all Q B H (P δ).pick ay r>. From the first step, we ca fid m + 1 poits { g 1 g m+1 }= {g( x 1 θ ) g( x m+1 θ )} i the port of g(x θ ) uder x P such that is iterior of the covex hull of { g 1 g m+1 }. From the property of the covex hull (Rockafeller (197), Corollary 2.3.1), we ca take c r > such that, for ay

ROBUSTNESS AND MOMENT RESTRICTIONS 11 poits {g 1 g m+1 } satisfyig g j g j c r for j = 1 m+ 1, the iterior of the covex hull of {g 1 g m+1 } cotais. Let us take ay j = 1 m+ 1. For the secod step, it is sufficiet to show that there exists δ j > such that Q{ g(x θ) g j c r } > forall θ θ δ j ad all Q B H (P δ j ).Suppose this is false, that is, for ay δ j >, we ca take a pair (Q j θ j ) such that H(Q j P ) δ j, θ j θ δ j,adq j { g(x θ j ) g j c r }=. The we have δ j H(Q j P ) ( { g(x θ j ) g j c r } { } = P g(x θj ) g j cr dq j dp ) 2 O the other had, by Assumptio 3.1(iv), the domiated covergece theorem guaratees P { g(x θj ) g j cr } P { g(x θ ) g j cr } > as θj θ Sice δ j ca be arbitrarily small, we have a cotradictio. This completes the secod step. I the third step, we show that, for each r>, there exists δ>such that R (θ Q) = if P P θ P Q H(P Q) has a miimum o {θ Θ : θ θ δ} for all Q B H (P r ) ad all large eough. Let us take δ>tosatisfy the coclusio of the secod step. By Assumptio 3.1(iv), we ca take N δ to satisfy max 1 j m+1 g( x θ θ δ j θ) m Nδ. Thus, lettig G (θ Q) be the covex hull of the port of g (x θ) uder x Q, the secod step also guaratees that, for each r>, there exists δ> such that it G (θ Q) for all θ θ δ, allq B H (P δ),adall N δ. Based o this, the covex duality result i Borwei ad Lewis (1993, Theorem 3.4) implies R (θ Q) = γ R m 1 dq for all θ θ (1+γ g (x θ)) δ, allq B H (P δ), ad all N δ. Sice γ R m 1 (1+γ g (x θ)) dq is cotiuous at all θ with θ θ δ (by the maximum theorem), the Weierstrass theorem completes the third step. Fially, based o the third step, it is sufficiet for the coclusio to show that, for every r>, there exists N N such that R (θ Q)<if : θ θ >δ R (θ Q) for all N ad all Q B H (P r ).Pickayr>. We first derive a upper boud of R (θ Q) = γ R m 1 (1+γ g (x θ )) dq. From Lemma A.5(ii), γ (θ Q) = arg max γ R m 1 (1+γ g (x θ )) dq exists ad x X γ (θ Q) g (x

12 Y. KITAMURA, T. OTSU, AND K. EVDOKIMOV θ ) 1 for all large eough ad all Q B 2 H(P r ). Thus, by a secod-order expasio aroud γ (θ Q)=, we have R (θ Q) 1 + γ (θ Q) g (x θ )dq Defie C = if : θ θ >δ E P g(x θ)] 2 /(1 + E P g(x θ)] ) >. From Lemma A.5 ad m, it holds that ( m R(θ Q)+ 1 ) (A.13) m γ (θ Q) g (x θ )dq < C 4 for all large eough ad all Q B H (P r ). We ow derive a lower boud of R (θ Q) with θ θ >δ.pickay such that θ θ >δ, ad take ay large eough ad Q B H (P r ) to satisfy (A.13). If / G (θ Q), the R (θ Q) =+. Thus, we cocetrate o the case of G (θ Q),whichguaratees R (θ Q) = γ R m 1 (1+γ g (x θ)) dq (Borwei ad Lewis (1993), Theorem 3.4). Let γ (θ) = E P g(x θ)]/(1 + E P g(x θ)] ). Observe that 1 R (θ Q) (1 + m 1γ (θ) g (x θ)) dq = 1 + m 1 γ (θ) g (x θ)dq m 2 (γ (θ) g (x θ)) 2 (1 + t(x)m 1γ (θ) g (x θ)) dq 3 where the secod equality follows from a expasio (t(x) ( 1) for almost every x uder Q). From a argumet similar to Lemma A.5, with γ (θ) 1adm, g (x θ) dq g(x θ)dp C 4 m 1 (γ (θ) g (x θ)) 2 (1 + t 1 (x)m 1γ (θ) g (x θ)) dq 3 C 4 for all large eough ad all Q B H (P r ). Combiig these results ad usig the defiitio of C,weobtai (A.14) ( if m R (θ Q) + 1 ) C : θ θ >δ 2 for all large eough ad all Q B H (P r ). Therefore, (A.13) ad(a.14) complete the proof of the fial step.

ROBUSTNESS AND MOMENT RESTRICTIONS 13 Proof of (ii). Pick arbitrary r> ad sequece Q B H (P r/ ). From the triagle iequality, (A.15) E Q g (x θ) ] ] E P g(x θ) EQ g (x θ) ] E P g (x θ) ] + E P g(x θ)i{x / X } ] The first term of (A.15) satisfies EQ g (x θ) ] E P g (x θ) ] g (x θ) { } dq dp 2 + 2 r 2 m E + 2 P = O ( ) { } g (x θ) dp dq dp ] g(x θ) 2 r where the first iequality follows from the triagle iequality, the secod iequality follows from Q B H (P r/ ) ad the Cauchy Schwarz iequality, ad the equality follows from Assumptio 3.1(v) ad (vii). The secod term of (A.15) satisfies (A.16) EP g(x θ)i{x / X } ] ( g(x θ) ) 1/η ( ) (η 1)/η η dp I{x / X } dp (E P = o ( ) ]) g(x θ) η 1/η(m η E P ]) g(x θ) η (η 1)/η where the first iequality follows from the Hölder iequality, the secod iequality follows from the Markov iequality, ad the equality follows from Assumptio 3.1(v) ad (vii). Combiig these results, we obtai the uiform covergece E Q g (x θ)] E P g(x θ)]. Therefore, from the tri-

14 Y. KITAMURA, T. OTSU, AND K. EVDOKIMOV agle iequality ad E Q g (x T Q )] = O( ) (Lemma A.6(i)), E P g(x T Q ) ] E P g(x T Q ) ] E Q g (x T Q ) ] + E Q g (x T Q ) ] The coclusio follows from Assumptio 3.1(iii). Q.E.D. LEMMA A.2: Suppose that Assumptio 3.1 holds. The, for each r> ad sequece Q B H (P r/ ), (A.17) ( T Q θ ) = Σ 1 Λ dq + o(1) PROOF: The proof is based o Rieder (1994, proofs of Theorems 6.3.4 ad Theorem 6.4.5). Pick arbitrary r>adq B H (P r/ ). Observe that (A.18) d P θ Q + 1 2 ( T Q θ ) Λ dq = dq d P θ Q + 1 2 2 ψ Q Λ dq dq 2 + 1 2 ( T Q θ ψ Q ) Λ dq { ( + dq d P θ Q + 1 ) 2 ψ Q Λ dq ( T Q θ ψ Q ) = dq d P θ Q + 1 2 ψ Q Λ dq 2 + 1 2 ( T Q θ ψ Q ) Λ dq where the secod equality follows from { dq d P θ Q + 1 } 2 ψ Q Λ dq Λ dq = { } Λ dq d P θ Q dq + 1 2 ψ Q 2 2 Λ dq } Λ Λ dq =

ROBUSTNESS AND MOMENT RESTRICTIONS 15 Thelefthadsideof(A.18) satisfies (A.19) dq d P θ Q + 1 2 ( T Q θ ) Λ dq dq d P T Q Q + o ( T Q θ ) + o ( ) ( dq d P θ +ψ Q Q + o T Q θ ) + o ( ) dq d P θ Q + 1 2 ψ Q Λ dq + o ( T Q θ ) + o ( ψ Q ) + o ( ) where the first iequality follows from the triagle iequality ad Lemma A.3(i), the secod iequality follows from T Q = arg mi dq d P θ Q ad the third iequality follows from the triagle iequality ad Lemma A.3(ii). From (A.18)ad(A.19), dq This implies that d P θ Q + 1 2 ψ Q Λ dq + 1 2 ( 2 T Q θ ψ Q ) Λ dq dq d P θ Q + 1 2 ψ Q Λ dq + o ( T Q θ ) + o ( ψ Q ) + o ( ) 2 (A.2) o ( T Q θ ) + o ( ψ Q ) + o ( ) 1 4 ( T Q θ ψ Q ) Λ Λ dq ( T Q θ ψ Q ) C T Q θ ψ Q for all large eough, where the secod iequality follows from Lemma A.5(i) ad Assumptio 3.1(vi).

16 Y. KITAMURA, T. OTSU, AND K. EVDOKIMOV We ow aalyze ψ Q. From the defiitio of ψ Q, {( ) 1 } ψ Q = 2 Λ Λ dq { } Σ 1 Λ dq d P θ Q dq (A.21) { } 2Σ 1 Λ dq d P θ Q dq From this ad Lemma A.5(i), the first term of (A.21) iso( ). The secod term of (A.21) satisfies { } 2Σ 1 Λ dq d P θ Q dq ( = 2Σ 1 G Ω 1 g (x θ )g (x θ ) dq )γ (θ Q ) + 2Σ 1 G Ω 1 ( γ (θ Q ) g (x θ ) 1 + γ (θ Q ) g (x θ ) g (x θ )g (x θ ) dq )γ (θ Q ) { = Σ 1 G Ω 1 g (x θ )dq + 1 2 } ϱ (x θ Q )g (x θ )dq + o ( ) = Σ 1 Λ dq + o ( ) where the first equality follows from (A.22), the secod equality follows from (A.23) ad Lemma A.5, ad the third equality follows from Lemma A.5. Therefore, ψ Q = Σ 1 Λ dq + o(1) which also implies ψ Q =O( ) (by Lemma A.5(i)). Combiig this with (A.2), ( T Q θ ) = ψ Q + o ( T Q θ ) + o(1) By solvig this equatio for ( T Q θ ), the coclusio is obtaied. Q.E.D. LEMMA A.3: Suppose that Assumptio 3.1 holds. The, for each r> ad sequece Q B H (P r/ ),

ROBUSTNESS AND MOMENT RESTRICTIONS 17 (i) d P T Q Q d P θ Q + 1 2 ( T Q θ ) Λ dq =o( T Q θ ) + o( ), (ii) d P θ +ψ Q Q d P θ Q + 1 2 ψ Q Λ dq =o( ψ Q ) + o( ). PROOF: Proof of (i). From the covex duality of partially fiite programmig (Borwei ad Lewis (1993)), the Rado Nikodym derivative d P θ Q /dq is writte as (A.22) d P θ Q dq = 1 (1 + γ (θ Q) g (x θ)) 2 for each N, θ Θ,adQ M,whereγ (θ Q) solves (A.23) with = g (x θ) (1 + γ (θ Q) g (x θ)) 2 dq = E Q g (x θ) { 1 2γ (θ Q) g (x θ) + ϱ (x θ Q) }] ϱ (x θ Q)= 3(γ (θ Q) g (x θ)) 2 + 2(γ (θ Q) g (x θ)) 3 (1 + γ (θ Q) g (x θ)) 2 Deote t = T Q θ. Pick arbitrary r> ad sequece Q B H (P r/ ). From the triagle iequality ad (A.22), d P T Q Q d P θ Q + 1 2 t Λ dq { γ (θ Q ) g (x θ ) γ ( T Q ) g (x T ) } dq Q Q + 1 2 t Λ dq + { γ (θ Q ) g (x θ ) γ ( T Q ) g (x T ) } Q Q { 1 (1 + γ ( T Q Q ) g (x T Q ))(1 + γ (θ Q ) g (x θ )) } 1 dq = T 1 + T 2

18 Y. KITAMURA, T. OTSU, AND K. EVDOKIMOV For T 2, Lemmas A.5 ad A.6 imply T 2 = o( ).ForT 1, the triagle iequality ad (A.23)yield { T 1 1 2 E Q g (x T Q ) ] E Q g (x T Q )g (x T Q ) ] 1 g (x T Q ) + 1 2 E Q g (x θ ) ] E Q g (x θ )g (x θ ) ] 1 g (x θ ) + 1 } 2 t Λ dq + EQ ϱ (x θ Q )g (x θ ) ] E Q g (x θ )g (x θ ) ] 1 g (x θ )dq + EQ ϱ (x T Q Q )g (x T Q ) ] E Q g (x T Q )g (x T Q ) ] 1 g (x θ )dq = T 11 + T 12 + T 13 Lemmas A.5 ad A.6 imply that T 12 = o( ) ad T 13 = o( ).ForT 11, expasios of g (x T Q ) aroud T Q = θ yield T 11 1 2 E Q g (x T Q ) ] ( E Q g (x T Q )g (x T Q ) ] 1 E Q g (x θ )g (x θ ) ] 1) g (x T Q )dq + 1 2 E Q g (x T Q ) ] E Q g (x θ )g (x θ ) ] 1 { g (x T Q ) g (x θ ) } dq ( ) + 1 g (x θ) 2 t dq θ G E Q g (x θ )g (x θ ) ] 1 g (x θ )dq + 1 2 t G ( Ω 1 E Q g (x θ )g (x θ ) ] 1) g (x θ )dq = o ( ) + o(t )

ROBUSTNESS AND MOMENT RESTRICTIONS 19 where θ is a poit o the lie joiig θ ad T Q, ad the equality follows from Lemmas A.5(i) ad A.6(i). Proof of (ii). Similar to the proof of Part (i) of this lemma. Q.E.D. LEMMA A.4: Suppose that Assumptio 3.1 hold. The, for each t R p, (i) E P g (x θ )] = o( ), E P g (x θ )] = O( ), E P g (x θ ) g (x θ ) ] Ω =o(1), ad E P g (x θ )/ θ ] G =o(1), (ii) γ (θ P ) = arg max γ R m 1 dp (1+γ g (x θ )) exists for all large eough, γ (θ P ) =O( ), ad x X γ (θ P ) g (x θ ) =o(1). PROOF: Proof of (i): Proof of the first statemet. The same argumet as (A.16) with Assumptio 3.1(iii) yields the coclusio. Proof of the secod statemet. Pick a arbitrary t R p. From the triagle iequality, EP g (x θ ) ] EP g(x θ )I{x / X } ] + EP g(x θ ) ] (A.24) By the same argumet as (A.16) ade P g(x θ ) η ] < (from Assumptio 3.1(v)), the first term of (A.24) iso( ). The secod term of (A.24) satisfies EP g(x θ ) ] ] EP g(x θ) t θ = O ( ) θ N for all large eough, where the iequality follows from a Taylor expasio aroud t = ad Assumptio 3.1(iii), ad the equality follows from Assumptio 3.1(v). Combiig these results, the coclusio is obtaied. Proof of the third statemet. Pick a arbitrary t R p. From the triagle iequality, EP g (x θ )g (x θ ) ] Ω E P g (x θ )g (x θ ) ] E P g(x θ )g(x θ ) ] + E P g(x θ )g(x θ ) ] Ω The first term is o( ) by the same argumet as (A.16) ad the secod term coverges to zero by the cotiuity of g(x θ) at θ. Proof of the fourth statemet. Similar to the proof of the third statemet. Proof of (ii). Pick a arbitrary t R p.letγ ={γ R m : γ a } with a positive sequece {a } N satisfyig a m ada. Observe that (A.25) γ Γ x X γ g (x θ) a m

2 Y. KITAMURA, T. OTSU, AND K. EVDOKIMOV Sice R (P θ γ) is twice cotiuously differetiable with respect to γ ad Γ is compact, γ = arg max γ Γ R (P θ γ) exists for each N. ATaylor expasio aroud γ = yields (A.26) 1 = R (P θ ) R (P θ γ) = 1 + γ E P g (x θ ) ] ] γ g (x θ )g (x θ ) E P γ (1 + γ g (x θ )) 3 1 + γ E P g (x θ ) ] C γ E P g (x θ )g (x θ ) ] γ 1 + γ E P g (x θ ) ] C γ 2 for all large eough, where γ is a poit o the lie joiig ad γ, the secod iequality follows from (A.25), ad the last iequality follows from Lemma A.4(i) ad Assumptio 3.1(vi). Thus, Lemma A.4(i) implies (A.27) C γ EP g (x θ ) ] = O ( ) From a, γ is a iterior poit of Γ ad satisfies the first-order coditio R (Q θ γ)/ γ = foralllarge eough. Sice R (Q θ γ) is cocave i γ for all large eough, γ = arg max γ R m R (P θ γ) for all large eough ad the first statemet is obtaied. Thus, the secod statemet is obtaied from (A.27). The third statemet follows from (A.27) ad Assumptio 3.1(vii). Q.E.D. LEMMA A.5: Suppose that Assumptio 3.1 holds. The, for each r> ad sequece Q B H (P r/ ), (i) E Q g (x θ )] = O( ), ad E Q g (x θ )g (x θ ) ] Ω =o(1), (ii) γ (θ Q ) = arg max γ R m 1 (1+γ g (x θ )) dq exists for all large eough, ad γ (θ Q ) =O( ), ad x X γ (θ Q ) g (x θ ) =o(1). PROOF: Proof of (i): Proof of the first statemet. Pick ay r> ad sequece Q B H (P r/ ).Wehave E Q g (x θ ) ] g (x θ ){dq dp } + EP g (x θ ) ] + 2 g (x θ ) { } dq dp 2 { } g (x θ )dp dq dp + o ( )

ROBUSTNESS AND MOMENT RESTRICTIONS 21 r 2 m + 2E P g(x θ ) 2 ] r + o ( ) = O ( ) where the first ad secod iequalities follow from the triagle iequality ad Lemma A.4(i), the third iequality follows from the Cauchy Schwarz iequality ad Q B H (P r/ ), ad the equality follows from Assumptio 3.1(v) ad (vii). Proof of the secod statemet. Pick arbitrary r> ad sequece Q B H (P r/ ). From the triagle iequality, (A.28) EQ g (x θ )g (x θ ) ] Ω EQ g (x θ )g (x θ ) ] E P g (x θ )g (x θ ) ] + EP g(x θ )g(x θ ) I{x / X } ] The first term of the right had side of (A.28) satisfies EQ g (x θ )g (x θ ) ] E P g (x θ )g (x θ ) ] + 2 g (x θ )g (x θ ) { } dq dp 2 { } g (x θ )g (x θ ) dp dq dp r 2 m 2 + 2E g(x θ P ) 4 ] r = o(1) where the first iequality follows from the triagle iequality, the secod iequality follows from the Cauchy Schwarz iequality ad Q B H (P r/ ), ad the equality follows from Assumptio 3.1(v) ad (vii). The secod term of (A.28) satisfies E P g(x θ )g(x θ ) I{x / X } ] ( g(x θ )g(x θ ) ) 1/(1+δ) ( 1+δ dp I{x / X } dp ) δ/(1+δ) ( E P g(x θ ) 2+δ ]) 1/(1+δ) ( m η E P g(x θ ) η ]) δ/(1+δ) = o(1) for sufficietly small δ>, where the first iequality follows from the Hölder iequality, the secod iequality follows from the Markov iequality, ad the equality follows from Assumptio 3.1(vii). Proof of (ii). Similar to the proof of Lemma A.4(ii). Repeat the same argumet with R (Q θ γ)istead of R (P θ γ). Q.E.D.

22 Y. KITAMURA, T. OTSU, AND K. EVDOKIMOV LEMMA A.6: Suppose that Assumptio 3.1 holds. The, for each r> ad sequece Q B H (P r/ ), (i) E Q g (x T Q )] = O( ), E Q g (x T Q )g (x T Q ) ] Ω =o(1), ad E Q g (x T Q )/ θ ] G =o(1), (ii) γ ( T Q Q ) = arg max γ R m 1 (1+γ g (x T dq Q )) exists for all large eough, γ ( T Q Q ) =O( ), ad x X γ ( T Q Q ) g (x T Q ) =o(1). PROOF: Proof of (i): Proof of the first statemet. Pick ay r> ad sequece Q B H (P r/ ).Defie γ = E Q g(x T Q )] EQ g (x T Q )].Sice γ =, (A.29) γ g (x θ) m x X Observe that (A.3) EQ g (x T Q )g (x T Q ) ] g (x θ) 2 { } dq dp 2 + 2 g (x θ) 2 { } dp dq dp + E P g (x θ) 2] r 2 m 2 + 2m CE P E P g(x θ) 2 ] g(x θ) 2] r + E P g(x θ) 2] for all large eough, where the first iequality follows from the triagle iequality, the secod iequality follows from the Cauchy Schwarz iequality ad Q B H (P r/ ), ad the last iequality follows from Assumptio 3.1(v) ad (vii). Thus, a expasio aroud γ = yields (A.31) R (Q T Q γ) = 1 + γ E Q g (x T Q ) ] ] γ g (x T Q )g (x T Q ) E Q γ (1 + γ g (x T Q )) 3 1 + EQ g (x T Q ) ] C γ E Q g (x T Q )g (x T Q ) ] γ 1 + E Q g (x T Q ) ] C 1

ROBUSTNESS AND MOMENT RESTRICTIONS 23 for all large eough, where γ is a poit o the lie joiig ad γ, the first iequality follows from (A.29), ad the secod iequality follows from γ γ = 1 ad (A.3). From the duality of partially fiite programmig (Borwei ad Lewis (1993)), γ ( T Q Q ) ad T Q are writte as γ ( T Q Q ) = arg max γ R m R (Q T Q γ) ad T Q = arg mi R (Q θ γ (θ Q )). Therefore, from (A.31), (A.32) 1 + E Q g (x T Q ) ] C 1 R (Q T Q γ) R ( Q T Q γ ( T Q Q ) ) R ( Q θ γ (θ Q ) ) By a argumet similar to (A.26), combied with γ (θ Q ) =O( ) ad E Q g (x θ )] = O( ) (by Lemma A.5), we have ( (A.33) R Q θ γ (θ Q ) ) 1 + γ (θ Q ) EQ g (x θ ) ] C γ (θ Q ) 2 = 1 + O ( 1) From (A.32)ad(A.33), the coclusio follows. Proof of the secod statemet. Similar to the proof of the secod statemet of Lemma A.5(i). Proof of the third statemet. Pick arbitrary r> ad sequece Q B H (P r/ ). From the triagle iequality, (A.34) EQ g (x T Q )/ θ ] G EQ g (x T Q )/ θ ] E P g (x T Q )/ θ ] + EP I{x / X } g(x T Q )/ θ ] + EP g(x T Q )/ θ ] G The first term of (A.34) satisfies E Q g (x T Q )/ θ ] E P g (x T Q )/ θ ] + 2 g (x T Q )/ θ { } dq dp 2 x X θ N = o(1) { } g (x T Q )/ θ dp dq dp g (x θ)/ θ r2 + 2E P g (x θ)/ θ 2] r θ N

24 Y. KITAMURA, T. OTSU, AND K. EVDOKIMOV where the first iequality follows from the triagle iequality, the secod iequality follows from the Cauchy Schwarz iequality, ad the equality follows from Assumptio 3.1(v) ad (vii). The secod term of (A.34) iso(1) by the same argumet as (A.16). The third term of (A.34)iso(1) by the cotiuity of g(x θ)/ θ at θ ad Lemma A.1(ii). Therefore, the coclusio is obtaied. Proof of (ii). Similar to the proof of Lemma A.4(ii). Repeat the same argumet with R (Q T Q γ)istead of R (P θ γ). Q.E.D. LEMMA A.7: Suppose that Assumptio 3.1 holds. The, for each sequece Q B H (P r/ ) ad r>, T P p θ uder Q. PROOF: Similar to the proof of Lemma A.1(i). Q.E.D. LEMMA A.8: Suppose that Assumptio 3.1 holds. The, for each r> ad sequece Q B H (P r/ ), ( T P θ ) = Σ 1 Λ dp + o p (1) uder Q ( T P T Q ) d N ( Σ 1) uder Q PROOF: The proof of the first statemet is similar to that of Lemma A.2 (replace Q with P ad use Lemmas A.9 ad A.1 istead of Lemmas A.5 ad A.6). For the secod statemet, Lemma A.2 ad the first statemet imply ( T P T Q ) = Σ 1 G Ω 1 1 { g (x i θ ) E Q g (x θ ) ]} + o p (1) i=1 uder Q. Thus, it is sufficiet to check that we ca apply a cetral limit theorem to the triagular array {g (x i θ )} 1 i. Observe that E Q g (x θ ) 2+ε ] = g (x θ ) 2+ε { } dq dp 2 g + 2 (x θ ) 2+ε { } dp dq dp + EP g (x θ ) 2+ε ] r 2 m 2+ε + 2m1+ε E P g(x θ ) 2 ] r + E P g(x θ ) 2+ε ] < for all large eough, where the first iequality follows from the Cauchy Schwarz iequality, ad the secod iequality follows from Assumptio 3.1(v) ad (vii). Therefore, the coclusio is obtaied. Q.E.D.

ROBUSTNESS AND MOMENT RESTRICTIONS 25 LEMMA A.9: Suppose that Assumptio 3.1 holds. The, for each r> ad sequece Q B H (P r/ ), the followig hold uder Q : (i) E P g (x θ )] = O p ( ), E P g (x θ )g (x θ ) ] Ω =o p (1), (ii) γ (θ P ) = arg max γ R m 1 dp (1+γ g (x θ )) exists a.s. for all large eough, γ (θ P ) =O p ( ), ad x X γ (θ P ) g (x θ ) =o p (1). PROOF: Proof of (i): Proof of the first statemet. From the triagle iequality, EP g (x θ ) ] EP g (x θ ) ] E Q g (x θ ) ] + EQ g (x θ ) ] The first term is O p ( ) by the cetral limit theorem for the triagular array {g (x i θ )} 1 i. The secod term is O( ) by Lemma A.5(i). Proof of the secod statemet. From the triagle iequality, EP g (x θ )g (x θ ) Ω ] EP g (x θ )g (x θ ) ] E Q g (x θ )g (x θ ) ] + EQ g (x θ )g (x θ ) ] Ω From a law of large umbers, the first term is o p (1). From Lemma A.5(i), the secod term is o(1). Proof of (ii). Similar to the proof of Lemma A.4(ii) except usig Lemma A.9(i) istead of Lemma A.4(i). Q.E.D. LEMMA A.1: Suppose that Assumptio 3.1 holds. The, for each r> ad sequece Q B H (P r/ ), the followig hold uder Q : (i) E P g (x T P )] = O p ( ), E P g (x T P )g (x T P ) ] Ω = O p ( ), ad E P g (x T P )/ θ ] G =o p (1), (ii) γ ( T P P ) = arg max γ R m 1 (1+γ g (x T dp P )) exists a.s. for all large eough, γ ( T P P ) =O p ( ), ad x X γ ( T P P ) g (x T P ) =o p (1). PROOF: Proof of (i). Similar to the proof of Lemma A.6(i). Proof of (ii). Similar to the proof of Lemma A.6(ii). Q.E.D. REFERENCES BORWEIN,J.M.,AND A. S. LEWIS (1993): Partially-Fiite Programmig i L 1 ad the Existece of Maximum Etropy Estimates, SIAM Joural of Optimizatio, 3, 248 267. 17,23] LECAM, L., AND G. YANG (199): Asymptotics i Statistics: Some Basic Cocepts. NewYork: Spriger. 9]

26 Y. KITAMURA, T. OTSU, AND K. EVDOKIMOV VAN DER VAART, A. (1998): Asymptotic Statistics. Cambridge: Cambridge Uiversity Press. 5] Cowles Foudatio for Research i Ecoomics, Yale Uiversity, New Have, CT 652, U.S.A.; yuichi.kitamura@yale.edu, Dept. of Ecoomics, Lodo School of Ecoomics ad Political Sciece, Houghto Street, Lodo, WC2A 2AE, U.K.; t.otsu@lse.ac.uk, ad Dept. of Ecoomics, Priceto Uiversity, Priceto, NJ 8544, U.S.A.; kevdokim@priceto.edu. Mauscript received Jue, 29; fial revisio received August, 212.