Fisher Iformatio April 6, 26 Debdeep Pati Fisher Iformatio Assume X fx θ pdf or pmf with θ Θ R. Defie I X θ E θ [ θ log fx θ 2 ] where θ log fx θ is the derivative of the log-likelihood fuctio evaluated at the true value θ. Fisher iformatio is meaigful for families of distributio which are regular:. Fixed support: {x : fx θ > } is the same for all θ. 2. θ log fx θ must exist ad be fiite for all x ad θ. 3. If E θ W X < for all θ, the k k E θ W X θ θ. Regular families W xfx θdx Oe parameter expoetial families: Cauchy locatio or scale family: fx θ π + x θ 2 fx θ πθ + x/θ 2 k W x fx θdx θ ad lots more. Most families of distributios used i applicatios are regular..2 No-regular families Uiform, θ Uiformθ, θ +.
.3 Facts about Fisher Iformatio Assume a regular family.. E θ θ log fx θ. Here θ log fx θ is called the score fuctio Sθ. Proof. E θ θ log fx θ θ θ log fx θ fx θdx fx θ fx θdx fx θ fx θdx θ fx θdx θ sice fx θdx for all θ. 2. I X θ Var θ Proof. Sice E θ θ log fx θ. θ log fx θ Var θ θ log fx θ E θ θ log fx θ 2 I X θ. 3. If X X, X 2,..., X ad X, X 2,..., X are idepedet radom variables, the I X θ I X θ + I X2 θ + I X θ. Proof. Note that fx θ 2 f i x i θ i
where f i θ is the pdf pmf of X i. Observe that log fx θ θ i θ log f ix i θ ad the radom variables i the sum are idepedet. This [ ] Var log fx θ θ so that I X θ i I X i θ by 2. i [ ] Var θ log f ix i θ 4. If X, X 2,..., X are i.i.d ad X X, X 2,..., X, the I Xi θ I X θ for all i so that I X θ I X θ. 5. A alterate formula for Fisher iformatio is I X θ E θ 2 log fx θ θ2 Proof. Abbreviate fx θdx as f, etc. Sice f, applyig θ f f θ θ θ log f f. θ f f to both sides, Applyig θ agai, Notig that θ θ log f f [ ] θ θ log f f 2 f θ 2 log f f + θ log f θ f θ 3 f θ f f, θ log f f,
this becomes or 2 2 θ 2 log f f + θ log f f 2 E log fx θ + I θ2 X θ. Example: Fisher Iformatio for a Poisso sample. Observe X,..., X iid Poissoλ. Fid I λ. We kow I λ I X λ. We shall calculate I X λ i three ways. Let X X. Prelimiaries: Method #: Observe that Method #2: Observe that Method #3: Observe that fx λ λx e λ x! log fx λ x log λ λ log x! λ log fx λ x λ 2 λ 2 log fx λ x λ 2 [ I X λ E λ λ 2 ] [ Xλ 2 ] log fx λ E λ X X Var λ sicee EX λ λ λ VarX λ 2 λ λ 2 λ λ 2 λ I X λ Var λ λ X Var λ I X λ E λ X log fx λ Var λ as i Method#. λ 2 Xλ log fx λ E λ2 λ 2 λ λ 2 λ. 4
Thus I λ I X λ λ. Example: Fisher iformatio for Cauchy locatio family. Suppose X, X 2,..., X iid with pdf fx θ π + x θ 2. Let X,..., X, X fx θ. Fid I θ. Note that I θ I X θ I X θ. Now f θ log fx θ θ f 2x θ π+x θ 2 2 π+x θ 2 2x θ + x θ 2 Now [ I X θ E θ 2 ] log fx θ 2 2X θ E + X θ 2 2x θ + x θ 2 4 π x θ 2 + x θ 2 3 dx. 2 π + x θ 2 dx Lettig u x θ, du dx, I X θ 4 π 8 π u 2 + u 2 3 du u 2 + u 2 3 du. 5
Substitutig x / + u 2, u /x /2, du.5/x /2 /x 2 dx, Hece I θ /2. I X θ 8 π 8 π 8 π 4 π u 2 + u 2 3 du u 2 + u 2 + u 2 2 du xx 2 /2/x /2 /x 2 dx x /2 x /2 dx 4 x 3/2 x 3/2 dx Beta itegral π 4 Γ3/2Γ3/2 π Γ3/2 + 3/2 4.5 π 2 π 2! 2. 2 Uses of Fisher Iformatio Asymptotic distributio of MLE s Cramér-Rao Iequality Iformatio iequality 2. Asymptotic distributio of MLE s i.i.d case: If fx θ is a regular oe-parameter family of pdf s or pmf s ad ˆθ ˆθ X is the MLE based o X X,..., X where is large ad X,..., X are iid from fx θ, the approximately, ˆθ N θ, Iθ where Iθ I X θ ad θ is the true value. Note that Iθ I X θ. More formally, ˆθ θ Iθ Iθˆθ θ d N, 6
as. More geeral case: Assumig various regularity coditios If fx θ is a oeparameter family of joit pdf s or joit pmf s for data X X,..., X where is large thik of a large dataset arisig from regressio or time series model ad ˆθ ˆθ X is the MLE, the ˆθ N θ, I X θ where θ is the true value. 2.2 Estimatio of the Fisher Iformatio If θ is ukow, the so is I X θ. Two estimates Î of the Fisher iformatio I Xθ are Î I X ˆθ, Î 2 2 θ 2 log fx θ θˆθ where ˆθ is the MLE of θ based o the data X. Î is the obvious plug-i estimator. It ca be difficult to compute I X θ does ot have a kow closed form. The estimator Î2 is suggested by the formula I X θ E 2 log fx θ θ2 It is ofte easy to compute, ad is required i may Newto- Raphso style algorithms for fidig the MLE so that it is already available without extra computatio. The two estimates Î ad Î2 are ofte referred to as the expected ad observed Fisher iformatio, respectively. As, both estimators are cosistet after ormalizatio for I X θ uder various regularity coditios. For example: i the iid case: Î /, Î2/, ad I X θ/ all coverge to Iθ I X θ. 2.3 Approximate Cofidece Itervals for θ Choose < α < say, α.5. Let z be such that P z < Z < z α where Z N,. Whe is large, we have approximately IX θˆθ θ N, 7
so that or equivaletly, { P z < } I X θˆθ θ < z α P {ˆθ z I X θ < θ < ˆθ + z } I X θ α. This approximatio cotiues to hold whe I X θ is replaced by a estimate Î either Î or Î2: { } P ˆθ z Î < θ < ˆθ + z α. Î Thus ˆθ z Î, ˆθ + z Î is a approximate α cofidece iterval for θ. Here ˆθ is the MLE ad Î is a estimate of the Fisher iformatio. 3 Cramer-Rao Iequality Let P θ, θ Θ R. Theorem. If fx θ is a regular oe-parameter family, E θ W τθ for all θ, ad τθ is differetiable, the Var θ W {τ θ} 2 I θ. Proof. Prelimiary Facts: A. [CovX, Y ] 2 VarXVarY. This is a special case of the Cauchy-Schwarz iequality. It is better kow to statisticias as ρ 2 where ρ is the correlatio betwee X ad Y. CovX, Y VarX VarY 8
B. CovX, Y EXY if wither EX or EY. This follows from the well-kow formula. Sice E θ θ log θ, from B, we have f CovX, Y EXY EXEY. [Cov θ W, θ log f θ] E [ W θ log f θ ] W x log θ θ fx fx θdx θ fx W x θ dx W sice θ is a regular family θ x fx θdx fx θ E θw τ θ. Sice from A., we have [Cov θ W, θ log f θ]2 VarW Var θ log θ, f [τ θ] 2 Var θ W I θ. Remark. Equality i A. is achieved iff Y ax + b for some costats a, b. Moreover, if EY, the EaX + b forces b aex so that Y ax EX for some costat a. Applyig this to the proof of CRLB with X W, Y θ log f θ tells us that Var θ W {τ θ} 2 I θ 9
iff log θ aθ[w τθ] θ f for some fuctio aθ. is true oly whe fx θ is a pef ad W ct + d for some c, d where T is the atural sufficiet statistic of the pef. 4 Asymptotic Efficiecy Let X X, X 2,..., X. Give a sequece of estimators W W X. If EW τθ for all, the {W } is asymptotically efficiet if where Var θ W lim V θ. V θ {τ θ} 2 I X θ What if Var θ W or if W is biased? A alterative defiitio: A sequece of estimators {W } is asymptotically ormal if W τθ V θ d N,. as. {W } is asymptotically efficiet for estimatig τθ if W ANτθ, V θ. Example: Observe X, X 2,..., X iid Poissoλ. Estimatio of τλ λ: E X λ. Does X achieve the CRLB? Yes! Var X VarX CRLB {τ λ} 2 I X λ λ /λ λ Alterative: Check coditio for exact attaimet of CRLB. log fx λ λ i λ log fx i λ i Xi λ λ X λ
Note: Sice X attais the CRLB for all, it must be the best ubiased estimator of λ. Showig that a estimator attais the CRLB is oe way to show it is best ubiased. But see later remark. Estimatio of τλ λ 2 : Defie W T T / 2 where T i X i. EW λ 2 see calculatios below ad W is a fuctio of the CSS T. Thus W is best ubiased for λ 2. Does W achieve the CRLB? No!!! Note that CRLB {τ λ} 2 I X λ VarW 4λ3 + 2λ2 2 2λ2 /λ 4λ3. see calculatios below. Alterative: Show coditio for achievemet of CRLB fails. As show earlier: λ log fx λ Xi λ T λ i The CRLB is attaied iff there exists aλ such that T T T λ aλ 2 λ 2. But the left side is liear i T ad the right side is quadratic i T, so that o multiplier aλ ca make them equal for all possible values of T,, 2,.... Remark 2. This situatio is ot uusual. The best ubiased estimator ofte fails to achieve the CRLB. But W is asymptotically efficiet: 4λ VarW 3 lim CRLB lim + 2λ2 2 4λ 3 lim +. 2λ Calculatios: Suppose Y Poissoξ. simple patter: The factorial momets of the Poisso follow EY ξ EY Y ξ 2 EY Y Y 2 ξ 3 EY Y Y 2Y 3 ξ 4
Proof of oe case: EY Y Y 2 ii i 2 ξi e ξ i! i ξ 3 ξ i 3 e ξ i 3! ξ i e ξ ξ3 ξ 3 j! From the factorial momets, we ca calculate everythig else. For example: i3 i VarY Y E[{Y Y } 2 ] [EY Y ] 2 E[{Y 2 Y 2 }] [ξ 2 ] 2 E[ Y 4 + 4 Y 3 + 2 Y 2 ] ξ 4 [ξ 4 + 4ξ 3 + 2ξ 2 ] ξ 4 4ξ 3 + 2ξ 2 where Y k Y Y Y 2 Y k +. I our case T Poissoλ so that substitutig ξ λ i the above results leads to so that W T T / 2 satisfies: ET T λ 2 2 λ 2 Var[T T ] 4λ 3 + 2λ 2 4 3 λ 3 + 2 2 λ 2 EW λ 2 VarW 4λ3 + 2λ2 2. 4. A asymptotically iefficiet estimator Example: Let X,..., X be iid with pdf fx α xα e x, x >. Γα For this pdf EX VarX α. Clearly E X α. Thus X MOM estimator of α. Is it asymptotically efficiet? No. verified below. Note: This is pef with atural sufficiet statistic T i log X i. Sice T is complete, E X T is the UMVUE of α. Sice X is ot a fuctio of T, we kow Var X > Var[E X T ]. But Var[E X T ] CRLB. Thus, without calculatio, we kow that X caot achieve the CRLB for ay value of. We ow show it does ot achieve it asymptotically either. Note that Var X VarX 2 α.
Ad, [ Γ αγα {Γ α} 2 ] } I X α I X α {Γα} 2 by a routie calculatio. Hece CRLB I X α. Thus Var X CRLB αi X α which does ot deped o. Sice X does ot achieve CRLB for ay, we kow αi X α >. Thus Var lim X CRLB αi X α > so that X is ot asymptotically efficiet. The fuctio αi X α is a o-egative decreasig fuctio with lim α αi X α lim α αi X α. Figure : Plot of αi X α, where I X α is called the trigamma fuctio derivative of digamma fuctio: Γ α Γα 3
Whe α is small, X is horrible. Whe α is large, X is pretty good. Geeral Commet: For regular families, the MLE is asymptotically efficiet. iefficiet i geeral. Thus MOM is lim VarW CRLB essetially compares the variace of W with that of the MLE i large samples. 5 Fisher Iformatio, CRLB, Asymptotic distributio of MLE s i the multi parameter case Notatio: fx θ, θ θ, θ 2,..., θ p ad θ θ. θ p ad S p is the vector of scores log θ θ f θ log f θ. θ p log θ f Defie p p matrix I θ ES p S p Note that S is evaluated at θ ad the expectatio is take uder the distributio idexed by the same parameter θ. For a vector or matrix, we defie the expected values i this way: Y E Z EY ZZ W X E Y Z EW EX EY EZ 5. Properties. E θ S p p. 4
2. I S CovS, the variace-covariace matrix of S 3. If X, X 2,..., X has idepedet compoets, the I θ I X θ + I X2 θ + + I X θ. 4. If X, X 2,..., X are iid, the 5. I I θ I X θ. θ E 2 log θ where we defie θ f 2 2 2 log θ log θ θ2 f θ i θ j f which is the p pmatrix whose i, j etry is 2 θ i θ j log f θ. 5.2 Asymptotic distributio of MLE of θ If ˆθ ˆθ X, X 2,..., X is the sequece of MLE s based o progressively larger samples, the ˆθ ANθ, I θ where AN ow stads for asymptotically multivariate ormal. This meas ˆθ Nθ, I θ for large. Recall: I iid case I θ I X θ. Estimate I θ by I ˆθ or 2 log θ θ i θ j f θˆθ 5
5.3 Multi-parameter CRLB X has joit pdf pmf fx θ which is a regular family. θ θ, θ 2,..., θ p. If EW X τθ where τθ R is differetiable fuctio of θ i, i,..., p, the VarW X g I g where g τθ θ p ad I I Xθ p p. Special Case: W X ˆθ i with τθ θ i. That is, ˆθ i is a ubiased estimate of θ i. Now that vector g has g i ad g j for j i, ad the CRLB gives Varˆθ i I ii where the right had side is the ith diagoal elemet of I. Weaker result: Suppose we kew θ j for all j i. By fixig θ j for j i at the kow values, we get a oe-parameter family ad the CRLB for the oe-parameter case gives But, sice I ii I ii, Varˆθ i I ii I ii E Varˆθ i I ii I ii θ i log fx θ where the upper lower boud is the best you ca do if you are estimatig θ i ad all the other parameters are ukow, ad the lower lower boud is the best you ca do whe all the other parameters are kow. Example: Nµ, σ 2 ξ distributio. Note that fx µ, ξ 2πξ e x µ2 /2ξ. 2 ad ad Iθ E l log f 2 log fx θ θ 2 l 2 l µ 2 µξ 2 l 2 l ξµ ξ 2 log2πξ x µ2 2ξ µ log f ξ log f E ξ x µ ξ 2ξ + x µ2 2ξ 2 X µ ξ 2 X µ X µ2 ξ 2 2ξ 2 ξ 3 6 ξ 2ξ 2
Hece I ξ 2ξ 2 σ 2 2σ 4. For a ubiased estimate of µe µ,σ 2W µ, VarW σ2 For a ubiased estimate of σ 2, VarW 2σ4 ad S 2 σ2 χ2 so that VarS2 2σ4. The limitig distribute of the MLE is give by X µ ˆσ 2 AN σ 2 Note: achieved by W X. ot achieved exactly S2 is best ubiased, σ 2 2σ 4 Var Xi µ 2 2σ4 E Xi µ 2 σ 2. achieves the CR-boud, but ot legitimate estimator if µ is ukow. Example: Gammaα, β Recall the digamma fuctio ψα Γ α Γα. Note that fx α, β Γαβ α xα e x/β l log f log Γα α log β + α log x x/β. The log fx θ θ α log f β log f ψα log β + log X α β + X β 2 ad Hece Iθ E Iθ 2 l α 2 2 l βα 2 l αβ 2 l β 2 E β 2 α β 2 β αψ α β ψ α ψ α β β α 2X β 2 β 3 ψ α β α β αψ α β β 2 ψ α β α β 2 CRLB for ubiased estimator of β is give by Var ˆβ I θ 22 {Iθ 22}. 7
Note that I θ 22 β2 α ψ α ψ α /α, {Iθ 22} β2 α. If α is kow the lower lower boud is achieved X E β α X Var VarX α α 2 αβ2 α 2 β2 α. If α must be estimated, there is a variace pealty which does ot vaish asymptotically. Figure 2: Plot of ψ α ψ α /α, showig that it does ot become asymptotically 8