Berry-Esseen Theorem. Po-Ning Chen, Professor. Institute of Communications Engineering. National Chiao Tung University. Hsin Chu, Taiwan 30010, R.O.C.

Berry-Esseen heorem Po-Ning Chen, Professor Institute of Communications Engineering National Chiao ung University Hsin Chu, aiwan 30010, R.O.C.

Historical aspects BE- he central limit theorem CL concerns the situation that the limit distribution of the normalized sum is normal. As an example, for i.i.d. zero-mean sequence X 1,X,..., X 1 + + X n ne[x 1 ] N, where N has standard normal distribution. Question: What is the rate of convergence of normalized sum distribution to standard normal distribution?

Historical aspects BE-3 he first convergence rate estimates in the CL were obtained by A. M. Lyapounov in 1900-1901. In the beginning of 1940s, the classic Berry-Esseen estimate came to the light: where sup F n x Φx C β 3 x R σ 3 n, F n is the cdf of X 1 + + X n / ne[x1 ], Φ is the standard normal cdf, [ β 3 = E X E[X] 3], [ σ = E X E[X] ],and C is a universal constant, independent of n. In fact, it was due to Lyapounov that the finiteness of + δth absolute moment, where 0 <δ<1, guarantees sup F n x Φx = On δ/ asn. x R

Berry-Esseen theorem BE-4 Classic Berry-Esseen theorem For independent zero-mean random variables {X i } n i=1, [ ] 1 Pr X 1 + + X n a Φa s n C r n, sup a R where s n sum of the marginal variances, r n sum of the marginal absolute 3rd central moments, C absolute constant, and Φ the unit Gaussian cumulative distribution function cdf. Notably, only the first three moments are involved in the Berry-Esseen inequality. s 3 n

Berry-Esseen theorem BE-5 sup a R [ ] 1 Pr X 1 + + X n a Φa s n C r n. s 3 n Feller s book, 1979 C = 6 for an independent sample sum. C = 3 for an independent and identically distributed sample sum. Beek 197 C =0.7975 for an independent sample sum. Shiganov 1986 C =0.7915 for an independent sample sum. C =0.7655 for an independent and identically distributed sample sum.

Berry-Esseen theorem BE-6 Again, the remarkable aspect of the Berry-Esseen theorem is that the upper bound depends only on the variance and the 3rd central moment. Hence, it can provide a good probability estimate based on the first three moments.

BE-7 echnique behind the proof: Pass the difference through a bandlimited filter. Lemma Fix a symmetric bandlimited filtering function with characteristic function ω ζ v x = 1 cosx πx v xe jζx dx = For any cdf H on the real line R, where η sup x R and hu πu = sin x/ πx sup x 1 x R η 6 π π h Hx Φx, u v t dt = π t 1 ζ, if ζ ; 0, otherwise. π u +1 cosu u η, [Ht x Φt x] v xdx, u 0 sinx x dx.

BE-8 Summary: η is the desired difference between any cdf H and standard normal cdf Φ. sup [Hx Φx] v x x R 1 η 6 π π π π t dt η v πη/ = 1 η 3η v ydy πη/ 1 = η 3 v ydy πη/ 1 = sup Hx Φx x R 3 v ydy πη/ Observation he maximum filter output bounds the maximum absolute value of filter input.

BE-9 Proof: he right-continuity of the cdf H and the continuity of the Gaussian unit cdf Φ together indicate the right-continuity of Hx Φx, which in turn implies the existence of x 0 Rsatisfying either η = Hx 0 Φx 0 or η = lim x x0 Hx Φx > Hx 0 Φx 0. We then distinguish between three cases: Case A η = Hx 0 Φx 0 ; Case B η =Φx 0 Hx 0 ; Case C η = lim Hx Φx > Hx 0 Φx 0. x x0

BE-10 Case A η = Hx 0 Φx 0. In this case, we note that for s>0, Hx 0 + s Φx 0 + s Hx 0 [ Φx 0 + s ] π 1 = η s π, where 1 follows from sup x R Φ x =1/ π. Observe that implies π π H x 0 + η x Φ x 0 + η x η 1 π π = 1 η + x π, η x for x <η π/.

BE-11 ogether with the fact that Hx Φx η for all x R,weobtain π sup x x 0 + η x R [ ] π π = H x 0 + η x Φ x 0 + η x v xdx [ ] π π = [ x <η H x 0 + η x Φ x 0 + η x v π/] xdx [ ] π π + [ x η H x 0 + η x Φ x 0 + η x v π/] xdx [ 1 [ x <η π/] η + x ] v π xdx + [ x η π/] η v xdx 1 = η v xdx + [ x η π/] η v xdx, [ x <η π/] where the last equality holds because of the symmetry of the filtering function v.

BE-1 he quantity of [ x η π/] v xdx can be derived as follows: [ x η π/] v xdx = η v xdx π/ Continuing from 3, = = sup x [1 1 x R η [ η η π/ v u 4 ηπ π h 4 ηπ π h 4 ηπ π h = 1 η 6 π π h du π π π π η η η η ].. ]

BE-13 Case B η =Φx 0 Hx 0. Similar to Case A, we first derive for s>0, [ Φx 0 s Hx 0 s Φx 0 s ] Hx 0 =η π s π, and then obtain π Φ x 0 η x H x 0 π η x η 1 π π = 1 η x π, η + x for x <η π/.

BE-14 ogether with the fact that Φx Hx η for all x R,weobtain π sup x x 0 η x R [ 1 [ x <η π/] η x ] v π xdx + [ x η π/] η v xdx 1 = [ x <η π/] η v xdx + [ x η π/] η v xdx = 1 η 6 π π h π η.

BE-15 Case C η = lim x x0 Hx Φx > Hx 0 Φx 0 0. In this case, we observe that for any 0 <δ<η, there exists x 0 such that Hx 0 Φx 0 η δ η. We can then follow the procedure of the previous two cases to obtain: sup x 1 6 x R η π π h π η. he proof is completed by noting that η can be made arbitrarily close to η.

BE-16 Lemma For any cumulative distribution function H with zero-mean and unit variance, its characteristic function ϕ H ζ satisfiesthat η 1 ϕ π H ζ e 1/ζ dζ ζ + 1 π π h πη, where η and h are defined in the previous lemma. his lemma transforms the bound in the previous lemma into frequency domain, namely the characteristic function domain.

BE-17 Proof: Observe that t = [Ht x Φt x] v xdx is nothing but a convolution of v andh Φ. By Fourier inversion theorem, d t dt = 1 t=x π = 1 π e jζx [ϕ H ζ e 1/ζ] ω ζdζ e jζx [ϕ H ζ e 1/ζ] ω ζdζ, where the second step follows from bandlimit assumption. Notably, ϕ H ζande ζ / are the Fourier transforms with respect to measures dhx anddφx, not the Fourier transforms of Hx andφx. Actually, Hx andφx have no Fourier transforms.

BE-18 Integrating with respect to x, weobtain x = 1 [ϕ e jζx H ζ e 1/ζ] ω π jζ ζdζ, 3 where no integration constant appears since both sides go to zero as x. Notably, where c is the integration constant. f x =g x fx =gx+c,

BE-19 Accordingly, 1 [ϕ sup x = sup x R x R π e jζx H ζ e 1/ζ] ω jζ ζdζ 1 [ϕ sup H ζ e 1/ζ] x R π e jζx ω jζ ζ dζ 1 = sup ϕ x R π H ζ e 1/ζ ω ζ dζ ζ 1 sup ϕ x R π H ζ e 1/ζ dζ ζ = 1 ϕ π H ζ e 1/ζ dζ ζ.

BE-0 ogether with we finally have η 1 π sup x 1 x R η 6 π π h π ϕ H ζ e 1/ζ dζ ζ + 1 π π h π η, η.

BE-1 heorem Let Y n = n i=1 X i be sum of i.i.d. random variables. Assume n 3. Denote the mean and variance of X 1 by ˆµ and ˆσ, respectively. Define [ ˆρ E X 1 ˆµ 3]. Also denote the cdf of Y n E[Y n ]/ˆσ nbyh n. hen sup H n y Φy 5 y R ˆρ ˆσ 3 n.

BE- Proof: π η ζ ˆϕn ˆσ e ζ / n where ˆϕ is the characteristic function of X 1 ˆµ. Lemma For any complex numbers α and β, α n β n n α β γ n 1, dζ ζ + 1 π h π where γ max{ α, β }. Proof: α n β n = α n n β r n r r n α r β r rn = n α β r n 1. η, 4

BE-3 Observe that the integrand satisfies ζ ˆϕn ˆσ e ζ / n n ζ ˆϕ ˆσ e ζ /n n γn 1, 5 ζ n ˆϕ ˆσ 1 ζ + n n 1 ζ e ζ /n n γ n 1 6 where the quantity γ in 5 requires that ζ ˆϕ ˆσ γ and e ζ /n γ. n We upperbound the first and second terms in the parentheses of 6 respectively by ζ ˆϕ ˆσ 1+ ζ n n ˆρ and 6ˆσ 3 n 3/ ζ 3 1 ζ n e ζ /n 1. 8n ζ4 Continuing the derivation of 6, ζ ˆϕn ˆσ e ζ / ˆρ n n + 1 γ n 1. 7 6ˆσ 3 n 3/ ζ 3 8n ζ4 It remains to choose γ that bounds both ˆϕζ/ˆσ n and exp{ ζ /n} from above.

BE-4 For complex number z and reals b and c, z b c z b + c. Accordingly, ζ ˆϕ ˆσ 1+ ζ n n ˆρ 6ˆσ 3 n 3/ ζ 3 ζ ˆϕ ˆσ n 1 ζ n + ˆρ. 6ˆσ 3 n 3/ ζ 3 Next, ζ ˆϕ ˆσ 1 ζ n n + ˆρ, if ζ 1. 8 6ˆσ 3 n 3/ ζ3 n For those ζ [,] which is exactly the range of integration operation in 4, we can guarantee the validity of the condition in 8 by defining and obtain for n 3. ζ n n = ˆσ6 ˆρ ˆσ3 n ˆρ n 3 n 1 n 3, n 1 1 n 3 1, n 1

BE-5 Hence, for ζ, ζ ˆϕ ˆσ 1+ ζ n n + ˆρ 6ˆσ 3 n 3/ ζ3 } exp { ζ n + ˆρ 6ˆσ 3 n { 3/ ζ3 exp 1 } ˆρ n ζ + 6ˆσ 3 n { 3/ζ 1 = exp n ˆρ } ζ 6ˆσ 3 n { 3/ = exp 3 } 6n 1 ζ. We can then choose γ exp { 3 } 6n 1 ζ Note that the above selected γ is an upper bound of exp { ζ /n } for n 3/.1..

BE-6 By taking the chosen γ into 7, the integration part in 4 becomes ζ ˆϕn ˆσ e ζ / dζ n ζ ˆρ n + 1 { } 3 exp ζ dζ 6ˆσ 3 n 3/ζ 8n ζ 3 6 ˆρ 6ˆσ 3 n ζ + 1 { } 3 8n ζ 3 exp ζ dζ 6 ˆρ = ˆσ 3 6π n 3 + 9 3/ 3 ˆσ 3 ˆρ n ˆρ ˆσ 3 6π n 3 + 9 3/ 3 1 n = 1 n 3 6π n 1 3 + 9 3/ 3 1, 9 n where the last inequality follows from Lyapounov s inequality, i.e., ˆσ = E 1/ [ X d+1 ˆµ ] E 1/3 [ X d+1 ˆµ 3] =ˆρ 1/3.

BE-7 aking 9 into 4, we finally obtain π η 1 n 3 n 1 + 1 π h π 6π 3 3/ + 9 3 η, 1 n or equivalently, πu 6hu for u πη/. πn 3 4n 1 6π 3/ + 3 9 1, 3 n 10

BE-8 Observe that πu 6hu =πu t 1 6 v u dt is continuous, and equals 0 at u = 0, and goes to as u,which guarantees the existence of positive u satisfying 10. Inequality 10 thus implies u max a 0 : πa 6ha πn 3 4n 1 6π 3/ + 3 9 1 3 n. Since πn 3 π = 3 π 4n 1 4 n 1, we can, for n 3, further upper-bound u by: π u max a 0 : πa 6ha 6π 9 1 3/ + 3 3 3 = 3.6369...

he proof is completed by η = u π 3.6369 π n 1 ˆρ = 6.5738 πn 3 ˆσ 3 n = 6.5738 1+ 3 π n 3 ˆρ ˆσ 3 n 6.5738 1+ 3 π 6 3 ˆρ ˆσ 3 n ˆρ = 4.19115 ˆσ 3 n ˆρ 5 ˆσ 3 n. BE-9