Bayesian Analysis in Moment Inequality Models Supplement Material

Bayesian Analysis in Moment Inequality Models Supplement Material Yuan Liao Wenxin Jiang November 4, 2008 Abstract This is a supplement material of Liao and Jiang (2008). We consider the case when the identified region has no interior. We show that the posterior distribution converges to zero on any δ- contraction outside the identified region exponentially fast, and is bounded below by a polynomial rate on any neighborhood of element in a dense subset, defined by both exact moment conditions and strict moment inequalities. Hence the consistent estimation of the identified region can be constructed based on the log-posterior pdf.

Case When int(ω) is Empty When Ω has no interior, moment inequality models may contain exact moment conditions. Em j (X, θ 0 ) 0, j =,..., r Em 2j (X, θ 0 ) = 0, j =,..., p (.) Moon and Schorfheide (2006) have considered the estimation problem assuming θ 0 is point identified by the exact moment conditions. Let m (X, θ) = (m (X, θ),..., m r (X, θ)) T, m 2 (X, θ) = (m 2 (X, θ),..., m 2p (X, θ)) T If p dim(θ 0 ), and there doesn t exist a pair of moment functions (m 2i, m 2j ) such that {θ Θ : Em 2i (X, θ) = 0} = {θ : Em 2j (X, θ) = 0}, then θ 0 is point identified by Em 2 (X, θ 0 ) = 0. Moon and Schorfheide (2006) show that by using the overidentifying information provided by Em (X, θ 0 ) 0, the empirical likelihood estimators reduce the asymptotic mean squared errors. In this section, we will relax this point identifying restriction, and allow θ 0 to be partially identified by model (4.). The identified region is defined by Ω = {θ : Em (X, θ) 0, Em 2 (X, θ) = 0} In our setting, Ω shrinks to a lower dimension sub-manifold of {θ : Em 2 (X, θ) = 0} with boundaries defined by linear or nonlinear hyperplanes {θ : Em (X, θ) = 0}. One of the problem one needs to take into account when considering the asymptotic behaviors of the posterior distribution is that Ω has zero Lebesgue measure, due to the loss of dimensionality. Thus integrating over Ω is always zero. The limit of posterior density is known as Dirac function: + θ Ω lim n p(θ Xn ) = a.s. 0, θ / Ω Thus lim n p(θ X n ) is not a real valued function of θ. However, it s still possible to study the large sample properties of the posterior distributions completely on Θ. Like in int(ω) φ case, a dense subset in Ω plays an important role in characterizing such behaviors. Define Ξ = {θ Ω : Em (X, θ) > 0} (.2) We will assume Ξ is dense in Ω and will study the large sample properties of the posterior around Ξ. 2

. Derivation for Limited Information Likelihood Suppose X n = {X,..., X n } is a stationary realization of X. Define m j (θ) = n n i= m j(x i, θ), for j =, 2. Like before, we introduce auxiliary parameter λ to moment inequalities and define G(θ, λ) = m (θ) λ, θ Θ, λ [0, ) r m 2 (θ) For any positive definite r r matrix V not depending on θ, define limited information likelihood: L(θ) = e n 2 G(θ,λ)T V G(θ,λ) p(λ)dλ [0, ) r det( 2πV ) n Write V into subblocks V = Σ Σ 3, Σ : r r, Σ 2 : p p Σ T 3 Σ 2 We still place an exponential prior on λ: then we have L(θ) = = [0, ) r r p(λ) = ( ψ i )e ψt λ, ψ, λ [0, ) r i= exp n det( 2πV ) 2 ( m (θ) λ, m 2 (θ)) Σ Σ 3 m (θ) λ p(λ)dλ Σ T 3 Σ 2 m 2 (θ) n r i= ψ i P (Z 0)eτ det(v2 ) where: Z follows multivariate normal distribution with mean µ, variance covariance matrix Σ n, µ = m (θ) + Σ ΣT 3 m 2 (θ) n Σ ψ. V 2 = (Σ 2 Σ T 3 Σ Σ 3). If V = V ar(m, m 2 ), then by the matrix inversion formula, V 2 = V ar(m 2 ). τ = n 2 m 2(θ) T V 2 m 2 (θ) ψ T (Σ ΣT 3 m 2 (θ) + m (θ)) + 2n ψt Σ ψ Roughly speaking, when θ / Ω, either Em 2 (X, θ) 0 or Em j (X, θ) < 0. When Em 2 (X, θ) 0, since V 2 is also positive definite, e τ 0; when Em 2 (X, θ) = 0 but Em j (X, θ) < 0 for some 3

j, then for large n, the jth component of µ < 0. Since the covariance matrix of Z has order O(n ), P (Z 0) 0. Therefore, L(θ) 0 outside Ω. When θ Ω, by central limit theorem, m 2 (θ) = O p (n /2 ), hence e τ = O p (). In addition, for large n, P (Z 0). Thus L(θ) = O p ()..2 Posterior Distribution Let p(θ) denote the prior on θ, then p(θ X n ) p(θ)l(θ). We will look at the posterior distribution, especially the convergence rate on the dense subset Ξ and (Ω c ) δ for small enough δ. Assumption.. Ξ defined in (.2) is dense in Ω. This assumption states that if θ 0 satisfies Em 2 (X, θ 0 ) = 0 and Em j (X, θ 0 ) = 0 for some j =,..., r, then in any neighborhood of θ 0 we can find θ such that Em (X, θ ) > 0 and Em 2 (X, θ ) = 0. Suppose all the other components of Em (X, θ 0 ) except for j are positive. By continuity of Em (X,.), they remain to be positive in a small neighborhood of θ 0. Suppose Assumption 4. doesn t hold, then within some neighborhood U of θ 0, Em (X, θ) 0. Since Ω is connected, we argue that Em j (X, θ) 0 on U Ω. Hence intuitively, Assumption 4. says that for each i, hyperplane {θ : Em i (X, θ) = 0} has no part that overlaps with {θ : Em 2 (X, θ) = 0}. Example.. This example shows Assumption. is satisfied by the interval regression model. Suppose we have moment inequalities E(Z Y ) E(Z X T )θ E(Z Y 2 ) and exact moment condition EZ 2 (Y 3 X T θ) = 0, where Z i, i =, 2 are r and r 2 dimensional vectors of instrumental variables respectively, with each instrument being positive almost surely, and don t share same components. Y i is scalar i =, 2, 3, and θ R d. Y 2 > Y 3 > Y a.s. Let W = (Z, Z 2, X, Y, Y 2, Y 3 ), then m (W, θ) = Z (Y 2 X T θ), m 2 (W, θ) = Z 2 (Y 3 X T θ) Z (X T θ Y ) We assume r 2 < d so that θ can not be point identified by Em 2 (W, θ) = 0. Let s also assume a unit vector δ such that EZ 2 X T δ = 0 but EZ X T δ < 0, where Z denotes the first component of Z. In this interval instrumental variable regression model, Ξ = {θ : E(Z Y ) < E(Z X T )θ < E(Z Y 2 ); EZ 2 Y 3 = EZ 2 X T θ} 4

We now show Ξ is dense. Pick up θ 0 Ω\Ξ such that EZ 2 (Y 3 X T θ 0 ) = 0, EZ (Y 2 X T θ 0 ) = 0. Let s assume Em j (W, θ 0 ) > 0 for j > for simplicity. Then in a small neighborhood of θ 0, Em j (W,.) > 0 for j >. For small enough ɛ > 0, let θ = θ 0 + ɛδ, then Em 2 (W, θ ) = EZ 2 (Y 3 X T θ 0 ) ɛez 2 X T δ = 0 Em (W, θ ) = EZ (Y 2 X T θ 0 ) ɛez X T δ = ɛez X T δ > 0 Therefore θ B(θ 0, 2ɛ) Ξ. Assumption.2. (i) Em j (X, θ) is continuous on Θ for each j. (ii) Em 2j (X, θ) is Lipschitz continuous on Θ for each j. Assumption.3. w.p.a., for any β n, sup m 2 (θ) Em 2 (X, θ) 2 ln β n θ Θ n Assumption.4. p(θ) is continuous, and bounded away from zero and infinity on Ω. Theorem.. Under Assumption 2., 2.2 in Liao and Jiang (2008), and.-.4, then. δ > 0, for some α > 0, P (θ (Ω c ) δ X n ) = o p (e αn ) 2. ω Ξ, R > 0, δ < R, for all β n, we have in probability where d = dim(ω) P (θ B(ω, δ) X n ) β n n d/2 Like the case when int(ω) φ, let g(.) be a continuous real-valued function on Θ, let Fg (y) be the y quantile of the posterior cdf of g(θ). 5

Theorem.2. Under Assumption 2.-2.3 in Liao and Jiang (2008) and.-.3, if {π n } n= is such that e αn π n n β, for any α > 0 and some β > d 2, then d H ([F g (π n ), F ( π n )], g(ω)) 0 in probability. g 2 Monte Carlo Experiments We simulate an interval instrumental regression model with exact moment conditions. Example 2. (Interval regression models with exact moment condition). We consider, E(Z Y ) E(Z X T )θ E(Z Y 2 ), E(Z 2 X T )θ = E(Z 2 Y 3 ) We generate (X, X 2 ) N 2 ((, ) T, I 2 ), and Z = X + X 2, Z 2 = 2X + 2X 2. (Y, Y 2 ) T N 2 ((3, 6) T, 0.I 2 ), independent of X. Let Y 3 = Z + 3. Then the identified region is given by Ω = {(θ, θ 2 ) : θ = θ 2, 2θ + θ 2 4} To estimate Ω, we choose a positive definite weight matrix where Σ 3 = (, 2) T. V = I 2 Σ 3 Σ T 3 6 Figure displays the identified region as well as 0,000 draws using Metropolis algorithm, with two choices of ψ = (0.5, 0.5) T, and ψ 2 = (0.0, 0.0) T respectively. [F e (π n ), F ( π n )] based on the empirical cdf F e of 5000 draws from the posterior distribution. We also estimate the identified interval of θ, which is [, 2] theoretically. Table 3 reports e 6

2 "=(0.5,0.5).9.8.7.6! 2.5.4.3.2. 0.8.2.4.6.8 2 2.2! "=(0.0, 0.0) 2.8.6! 2.4.2 0.8..2.3.4.5.6.7.8.9 2! Figure : The identified set and MCMC draws Table : Estimation of Ω = [, 2] based on the empirical cdf π n e n n ln n n = 500 [.384, 2.0295] [.0068,.933] [.0904,.6207] n = 000 [.0809,.9425] [0.9620,.8844] [.83,.8874] n = 5000 [.045,.855] [0.9944,.9575] [.878,.9729] 7

3 Proofs Define A δ = {θ : Em 2 (X, θ) T V2 Em 2 (X, θ) > δ}, and { } A 2 = θ : Em 2 (X, θ) = 0, min Em j (X, θ) < 0 j Lemma 3.. δ > 0, for some a > 0 A δ A 2 p(θ)l(θ)dθ = o p (e an ) Proof. Define Âδ = {θ : m 2 (θ) T V2 m 2 (θ) > δ}, p(θ)l(θ)dθ = p(θ)l(θ)dθ + A δ A δ Âδ p(θ)l(θ)dθ + Â δ A δ Âc δ A δ Âc δ p(θ)l(θ)dθ p(θ)l(θ)dθ A δ Âc δ = {θ : Em 2(X, θ) T V 2 Em 2 (X, θ) > δ} {θ : m 2 (θ) T V 2 m 2 (θ) δ} φ w.p.a.. Hence for large n, µ(a δ Âc δ ) = 0. Then N, when n > N, w.p.a., i p(θ)l(θ)dθ p(θ) ψ i A δ Â δ det(v2 ) e n 2 m2(θ)t V 2 m 2(θ) ψ T (Σ For some ɛ > 0, for large n, ΣT 3 m2(θ)+ m(θ))+ 2n ψt Σ ψ dθ e ψt (Σ ΣT 3 m2(θ)+ m(θ))+ 2n ψt Σ ψ e ψ (sup θ Θ Σ ΣT 3 Em2(X,θ)+Em(X,θ) +ɛ)+ɛ < Thus for some positive constant C, and large n, p(θ)l(θ)dθ C p(θ)e n 2 m2(θ)t V 2 m 2(θ) dθ C e δ 2 n A δ Â δ In addition, µ(a 2 ) = 0 and p(θ)l(θ) is bounded on Θ, hence p(θ)l(θ)dθ A δ A 2 p(θ)l(θ)dθ + A δ p(θ)l(θ)dθ = O p (e δ 2 n ) A 2 Lemma 3.2. δ > 0, for some a > 0, (Ω c ) δ p(θ)l(θ)dθ = o p (e an ) 8

Proof. : θ (Ω c ) δ, then either δ(θ) > 0, θ A δ(θ), or θ A 2, hence θ A δ(θ) A 2. Thus (Ω c ) δ θ (Ω c ) δ[a δ(θ) A 2 ]. Note that (Ω c ) δ = {θ : d(θ, Ω) δ} is compact, hence {A δ A 2,..., A δn A 2 } {A δ(θ) A 2 : θ (Ω c ) δ } such that (Ω c ) δ N [A δi A 2 ] i= Let δ = min{δ i, i =,..., N}. For a > b > 0, A a A b, hence (Ω c ) δ A δ A 2. Therefore p(θ)l(θ)dθ p(θ)l(θ)dθ = o p (a an ) (Ω c ) δ A δ A 2 Lemma 3.3. If Z θ follows N r ( m (θ) + Σ ΣT 3 m 2 (θ) n Σ w.p.a., lim inf n inf P (Z θ 0) > 0 θ B(ω,R) ψ, n Σ ), then ω Ξ, R > 0, Proof. Let ξ n (θ) = m (θ)+σ ΣT 3 m 2 (θ) n Σ ψ. ω Ξ, Em (X, ω) > 0. Since Em (X, θ) is continuous on Θ, there exist ɛ > 0, and an open ball B(ω, R ), such that inf θ B(ω,R) Em (X, θ) > ɛ, where the inequality is taken coordinately. Moreover, Em 2 (X, ω) = 0; hence by the continuity of Em 2 (X,.), R < R such that sup θ B(ω,R) Σ ΣT 3 Em 2 (X, θ) < ɛ, where. denotes the absolute value, taken coordinately. Therefore, inf θ B(ω,R) (Em (X, θ) + Σ ΣT 3 Em 2 (X, θ)) > 0. N, when n > N, w.p.a., coordinately. inf ( m (θ) + Σ θ B(ω,R) ΣT 3 m 2 (θ) n Σ ψ) = inf ξ n(θ) > 0 θ B(ω,R) Let σj 2 denote the jth diagonal element in Σ, and ξ nj(θ) denote the jth element of ξ n (θ). Then inf P (Z θ 0) r Φ n inf min ξ nj (θ) θ B(ω,R) θ B(ω,R) j σj 2 r Φ inf θ B(ω,R) ξ nj (θ) n min j σj 2 > n 0 Lemma 3.4. For any β n, ω Ξ, R > 0, δ < R, w.p.a., p(θ)l(θ)dθ n d/2 β n where d = dim(ω). B(ω,δ) 9

Proof. ω Ξ, it can be shown that (using lemma C.3), R > 0, and a positive constant C, such that B(ω,R) p(θ)l(θ)dθ C p(θ)e n 2 m2(θ)t V 2 m 2(θ) dθ B(ω,R) For deterministic V 2, and a vector α, we write weighted norm α 2 V = αt V 2 α. Then we have 2 m 2(θ) 2 V Em 2 (X, θ) 2 V + m 2 (θ) Em 2 (X, θ) 2 V By assumption 4.3, for any β n, choose β V n, so that e n m2(θ) Em2(X,θ) 2 V e ln β V n V = β V V n = β n For some constant α >, let U = {θ : Em 2 (X, θ) 2 V < ln α n }. By assumption 3.3, R < R, such that for any 0 < δ < R, inf θ B(ω,δ) p(θ) > 0. Then p(θ)l(θ)dθ C p(θ)dθ Const µ(b(ω, δ) U) B(ω,δ) B(ω,δ) U αβ n β n To derive a lower bound for the Lebesgue measure of B(ω, δ) U, note that Em 2 (X, θ) is Lipschitz continuous, and Em 2 (X, ω) = 0, λ > 0, such that θ B(ω, δ), Em 2 (X, θ) 2 λ θ ω 2. Then Em 2 (X, θ) 2 V Em 2 (X, θ) 2 V 2 λ V 2 θ ω 2 λ V 2 θ ω 2 where θ ω = max j θ j ω j. Hence {θ : θ ω 2 < ln α λ V 2 n } U. Moreover, for large enough n, {θ : θ ω 2 ln α < λ V 2 n } B(ω, δ), thus µ(b(ω, δ) U) µ({θ : θ ω 2 < ln α λ V 2 n }) = (2 ln α λ V ) d n d/2. Hence 2 B(ω,δ) p(θ)l(θ)dθ β n n d/2. Proof of Theorem. Proof.. Let β n = n d/2. Lemma C.4 implies that Θ p(θ)l(θ)dθ n d. Thus by Lemma C.2, for some α > 0, P (θ (Ω c ) δ X n ) = (Ω c ) δ p(θ)l(θ)dθ Θ p(θ)l(θ)dθ o p(e αn ) n d = o p (e α 2 n ) 2. The result follows immediately from Lemma C.4 and that p(θ)l(θ)dθ is bounded. Θ 0

References [] Liao, Y. and Jiang, W. (2008). Bayesian analysis in moment inequality models. working paper. Northwestern University. [2] Moon, H. and Schorfheide, F. (2006), Boosting Your Instruments: Estimation with Overidentifying Inequality Moment Conditions. CEPR Discussion Paper No. 5605 Simple Measure, Review of Economics and Statistics. LXXIX, 348-352.