Lecture 34 Bootstrap confidence intervals
Confidence Intervals θ: an unknown parameter of interest We want to find limits θ and θ such that Gt = P nˆθ θ t If G 1 1 α is known, then P θ θ = P θ θ = 1 α θ = ˆθ G 1 1 α/ n is an exact 1001 α% lower confidence limit for θ Traditional asymptotic approach: Gt Φt/σ, n Φ: the standard normal cdf σ: an unknown scale parameter ˆσ: a consistent estimator of σ, Gˆσt Φt Normal approximation NA 1001 α% confidence limits are θ N = ˆθ ˆσz 1 α / n, θ N = ˆθ + ˆσz 1 α / n z a = Φ 1 a
Bootstrap Confidence Intervals 1 The hybrid bootstrap HB A bootstrap estimator of Gt = P nˆθ θ t is Ĝt = P nˆθ ˆθ t G 1 1 α can be estimated by Ĝ 1 1 α HB lower and upper confidence limits: θ HB = ˆθ Ĝ 1 1 α/ n θ HB = ˆθ Ĝ 1 α/ n If Gt is nearly symmetric, then Ĝ 1 α can be replaced by Ĝ 1 1 α Hybrid: Use bootstrap estimate in the construction of confidence interval based on the sampling distribution of ˆθ θ.
2 Bootstrap-t BT Gt = P nˆθ θ t Φt/σ ˆσ: a consistent estimator of σ Studentized statistic: nˆθ θ/ˆσ Ht = P nˆθ θ/ˆσ t Φt ˆσ : a bootstrap analog of ˆσ A bootstrap estimator of Ht: Ĥt = P nˆθ ˆθ/ˆσ t BT lower and upper confidence limits: θ BT = ˆθ ˆσĤ 1 1 α/ n θ BT = ˆθ + ˆσĤ 1 1 α/ n This is also a hybrid method It is called BT because it is based on the studentized variable nˆθ θ/ˆσ
3 The bootstrap percentile BP Bootstrap distribution histogram Kt = P ˆθ t 1 B 1001 α% BP lower confidence limit # of times ˆθ b t θ BP = K 1 α = inf{t : Kt α} ˆθ B θ BP αbth ordered value of ˆθ 1,..., 1001 α% BP upper confidence limit θ BP = K 1 1 α = inf{t : Kt 1 α} θ BP 1 αbth ordered value of ˆθ 1,..., There are two improvements over the BP ˆθ B
Accuracy: Justification Properties of the bootstrap percentile BP Lower confidence bound level 1 α θ BP = K 1 α, Kt = P ˆθ t Assumption A: There is a monotone transformation φ such that P ˆφ φ t = Ψt for all t and all P φ = φθ, ˆφ = φˆθ Ψ: a continuous, increasing cdf symmetric about 0 Result 1. If φ is known, then θ E = φ 1 ˆφ + z α, z α = Ψ 1 α, is an exact 1001 α% lower confidence bound, i.e., P θ E θ = 1 α
Proof P θ E θ = P φθ E φθ = P φφ 1 ˆφ + z α φ = P ˆφ + zα φ = P ˆφ φ zα = Ψ z α = 1 Ψz α = 1 α
Result 2. Under Assumption A, θ BP = θ E. This means that the BP interval is exactly correct under Assumption A To use the BP, we don t need to know φ The derivation of φ is replaced by computation If Assumption A holds approximately when n is large, then the BP intervals are approximately correct
Proof Let w = φθ BP ˆφ and ˆφ = φˆθ Since Assumption A holds for all P including P, Ψw = P ˆφ ˆφ w = P ˆφ ˆφ φθ BP ˆφ = P φˆθ φθ BP = P ˆθ θ BP = Kθ BP = α By the uniqueness of the quantile, w = Ψ 1 α = z α Then Hence φθ BP = ˆφ + z α θ BP = φ 1 ˆφ + z α = θ E
The bootstrap bias-corrected percentile BC Is Assumption A too strong? Assumption B: There is a monotone transformation φ and a constant z 0 such that P ˆφ φ + z 0 t = Ψt for all t and all P Ψ: a continuous, increasing cdf symmetric about 0 If z 0 = 0, then Assumption B reduces to Assumption A. z 0 is a bias correction. Since Assumption B holds for all P including P, i.e., z 0 = Ψ 1 Kˆθ Kˆθ = P ˆθ ˆθ = P ˆφ ˆφ = P ˆφ ˆφ + z 0 z 0 = Ψz 0
Result 3. θ E = φ 1 ˆφ + z α + z 0 is an exact 1001 α% lower confidence limit for θ under Assumption B Proof: Similar to the proof of result 1. We need to know φ and Ψ in order to use θ E The BC confidence limits are θ BC = K 1 Ψz α + 2z 0 = K 1 Ψz α + 2Ψ 1 Kˆθ θ BC = K 1 Ψz 1 α + 2Ψ 1 Kˆθ When bootstrap Monte Carlo of size B is applied, θ BC BΨz α + 2z 0 th ordered value of ˆθ 1,..., ˆθ B θ BC BΨz 1 α + 2z 0 th ordered value of ˆθ 1,..., ˆθ B BC = BP if Kˆθ = 0.5, since Ψ 1 0.5 = 0 i.e., ˆθ is the median of Kt bootstrap histogram Thus, BC is a bias-corrected BP It shifts the BP bound towards left or right according to z 0 BC does not need the explicit form of φ But BC requires the form of Ψ typically, Ψ = Φ
Result 4. Under Assumption B, θ BC = θ E BC is exactly correct if Assumption B holds. If Assumption B holds approximately, then BC is approximately correct Proof: For any α, 1 α = Ψz 1 α = P ˆφ ˆφ + z 0 z 1 α = P ˆφ ˆφ + Ψ 1 1 α z 0 = P ˆθ φ 1 ˆφ + Ψ 1 1 α z 0 = Kφ 1 ˆφ + Ψ 1 1 α z 0 Hence, for all t, K 1 t = φ 1 ˆφ + Ψ 1 t z 0 Let t = Ψz α + 2z 0, K 1 Ψz α + 2z 0 = φ 1 ˆφ + z α + 2z 0 z 0 = φ 1 ˆφ + z α + z 0 = θ E
The bootstrap accelerated bias-corrected percentile BC a Assumption B is weaker than Assumption A Assumption B is still too strong in some cases Weaker assumptions? Assumption C: There is a monotone transformation φ and constants z 0 and a acceleration constant such that ˆφ φ P 1 + aφ + z 0 t = Ψt for all t and all P Ψ: a continuous, increasing cdf symmetric about 0 If a = 0, Assumption C reduces to Assumption B. a corrects skewness
Result 5. Under Assumption C, θ E = φ 1 ˆφ + z α + z 0 1 + a ˆφ 1 az α + z 0 is an exact 1001 α% lower confidence limit for θ Proof: 1 α = Ψ z α ˆφ φ = P 1 + aφ + z 0 z α = P ˆφ φ zα + z 0 1 + aφ ˆφ + zα + z 0 = P 1 az α + z 0 φ = P ˆφ + z α + z 0 1 + a ˆφ φ 1 az α + z 0 = P θ E θ
We need to know a, Ψ, and φ in order to use θ E Under Assumption C, z 0 = Ψ 1 Kˆθ First, assume a is known The BC a lower confidence bound is θ BCa = K Ψ 1 z α + z 0 z 0 + 1 az α + z 0 The BC a upper confidence bound is θ BCa = K Ψ 1 z 1 α + z 0 z 0 + 1 az 1 α + z 0 When bootstrap Monte Carlo of size B is applied, θ BCa BΨ z 0 + zα+z 0 1 az α+z 0 th ordered value of ˆθ 1,..., ˆθ B θ BCa BΨ z 0 + z 1 α+z 0 1 az 1 α +z 0 th ordered value of ˆθ 1,..., ˆθ B We don t need the explicit form of φ But we need to know a and Ψ If a = 0, BC a reduces to BC
Result 6. If a is known, then θ BCa = θ E. Hence, BC a is exactly correct if Assumption C holds. If Assumption C holds approximately, then BC a is approximately correct Proof. 1 α = Ψz 1 α ˆφ = P ˆφ 1 + a ˆφ + z 0 z 1 α = P ˆφ ˆφ + z 1 α z 0 1 + a ˆφ ˆθ = P φ 1 ˆφ + z 1 α z 0 1 + a ˆφ = K φ 1 ˆφ + z 1 α z 0 1 + a ˆφ = K φ 1 ˆφ + Ψ 1 1 α z 0 1 + a ˆφ This shows that K 1 t = φ 1 ˆφ + Ψ 1 t z 0 1 + a ˆφ
Let t = Ψ z 0 + zα+z 0 1 az α+z 0, θ E = φ 1 ˆφ + z α + z 0 1 + a ˆφ 1 az α + z 0 = K Ψ 1 z 0 + = θ BCa z α + z 0 1 az α + z 0 The constant a is unknown and has to be estimated â: an estimator of a θ BCâ and θ BCâ are BCâ confidence limits They are also called ABC confidence limits
Asymptotic Accuracy A confidence set C is first order accurate if and second order accurate if P θ C = 1 α + On 1/2 P θ C = 1 α + On 1 For the case of ˆθ is a smooth function of sample means, we have shown the following summary: 1. The BT and bootstrap BC a one-sided confidence intervals are second order accurate. 2. The the BP, BC, HB, and NA one-sided confidence intervals are in general first order accurate. 3. The equal-tail two-sided confidence intervals produced by all five bootstrap methods and the normal approximation are second order accurate errors cancel each other.