true value θ. Fisher information is meaningful for families of distribution which are regular: W (x) f(x θ)dx

Σχετικά έγγραφα
Homework for 1/27 Due 2/5

Lecture 17: Minimum Variance Unbiased (MVUB) Estimators

Solutions: Homework 3

Other Test Constructions: Likelihood Ratio & Bayes Tests

1. For each of the following power series, find the interval of convergence and the radius of convergence:

Last Lecture. Biostatistics Statistical Inference Lecture 19 Likelihood Ratio Test. Example of Hypothesis Testing.

LAD Estimation for Time Series Models With Finite and Infinite Variance

Lecture 3: Asymptotic Normality of M-estimators

Solution Series 9. i=1 x i and i=1 x i.

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

n r f ( n-r ) () x g () r () x (1.1) = Σ g() x = Σ n f < -n+ r> g () r -n + r dx r dx n + ( -n,m) dx -n n+1 1 -n -1 + ( -n,n+1)

SUPERPOSITION, MEASUREMENT, NORMALIZATION, EXPECTATION VALUES. Reading: QM course packet Ch 5 up to 5.6


Homework 4.1 Solutions Math 5110/6830

Statistical Inference I Locally most powerful tests

2 Composition. Invertible Mappings

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

Lecture 21: Properties and robustness of LSE

6. MAXIMUM LIKELIHOOD ESTIMATION

ST5224: Advanced Statistical Theory II

Probability theory STATISTICAL MODELING OF MULTIVARIATE EXTREMES, FMSN15/MASM23 TABLE OF FORMULÆ. Basic probability theory

Three Classical Tests; Wald, LM(Score), and LR tests

p n r

Parameter Estimation Fitting Probability Distributions Bayesian Approach

4.6 Autoregressive Moving Average Model ARMA(1,1)

The Equivalence Theorem in Optimal Design

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

Concrete Mathematics Exercises from 30 September 2016

Inverse trigonometric functions & General Solution of Trigonometric Equations

C.S. 430 Assignment 6, Sample Solutions

Solve the difference equation

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

Diane Hu LDA for Audio Music April 12, 2010

The Heisenberg Uncertainty Principle

The Simply Typed Lambda Calculus

Fourier Series. MATH 211, Calculus II. J. Robert Buchanan. Spring Department of Mathematics

Introduction of Numerical Analysis #03 TAGAMI, Daisuke (IMI, Kyushu University)

MATH 38061/MATH48061/MATH68061: MULTIVARIATE STATISTICS Solutions to Problems on Matrix Algebra

Estimation for ARMA Processes with Stable Noise. Matt Calder & Richard A. Davis Colorado State University

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

6.3 Forecasting ARMA processes

Lecture 34 Bootstrap confidence intervals

Example Sheet 3 Solutions

Outline. Detection Theory. Background. Background (Cont.)

Μηχανική Μάθηση Hypothesis Testing

Every set of first-order formulas is equivalent to an independent set

α β

IIT JEE (2013) (Trigonomtery 1) Solutions

Matrices and Determinants

Homework 3 Solutions

Partial Differential Equations in Biology The boundary element method. March 26, 2013

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

Lecture 12: Pseudo likelihood approach

Homework 8 Model Solution Section

Στα επόμενα θεωρούμε ότι όλα συμβαίνουν σε ένα χώρο πιθανότητας ( Ω,,P) Modes of convergence: Οι τρόποι σύγκλισης μιας ακολουθίας τ.μ.

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

ORDINAL ARITHMETIC JULIAN J. SCHLÖDER

Theorem 8 Let φ be the most powerful size α test of H

Degenerate Perturbation Theory

Fractional Colorings and Zykov Products of graphs

Solutions to Exercise Sheet 5

ω ω ω ω ω ω+2 ω ω+2 + ω ω ω ω+2 + ω ω+1 ω ω+2 2 ω ω ω ω ω ω ω ω+1 ω ω2 ω ω2 + ω ω ω2 + ω ω ω ω2 + ω ω+1 ω ω2 + ω ω+1 + ω ω ω ω2 + ω

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

A Note on Intuitionistic Fuzzy. Equivalence Relation

EE512: Error Control Coding

Bessel function for complex variable

INTEGRATION OF THE NORMAL DISTRIBUTION CURVE

Math221: HW# 1 solutions

Uniform Convergence of Fourier Series Michael Taylor

Presentation of complex number in Cartesian and polar coordinate system

On Generating Relations of Some Triple. Hypergeometric Functions

Tridiagonal matrices. Gérard MEURANT. October, 2008

Areas and Lengths in Polar Coordinates

Nowhere-zero flows Let be a digraph, Abelian group. A Γ-circulation in is a mapping : such that, where, and : tail in X, head in

More Notes on Testing. Large Sample Properties of the Likelihood Ratio Statistic. Let X i be iid with density f(x, θ). We are interested in testing

Finite Field Problems: Solutions

F19MC2 Solutions 9 Complex Analysis

Section 8.3 Trigonometric Equations

Exam Statistics 6 th September 2017 Solution

forms This gives Remark 1. How to remember the above formulas: Substituting these into the equation we obtain with

derivation of the Laplacian from rectangular to spherical coordinates

1. Matrix Algebra and Linear Economic Models

Uniform Estimates for Distributions of the Sum of i.i.d. Random Variables with Fat Tail in the Threshold Case

CHAPTER 103 EVEN AND ODD FUNCTIONS AND HALF-RANGE FOURIER SERIES

Lecture 2. Soundness and completeness of propositional logic

Biorthogonal Wavelets and Filter Banks via PFFS. Multiresolution Analysis (MRA) subspaces V j, and wavelet subspaces W j. f X n f, τ n φ τ n φ.

On Inclusion Relation of Absolute Summability

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

Notes on the Open Economy

Areas and Lengths in Polar Coordinates

Quadratic Expressions

Section 7.6 Double and Half Angle Formulas

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

The ε-pseudospectrum of a Matrix

= λ 1 1 e. = λ 1 =12. has the properties e 1. e 3,V(Y

Introduction to the ML Estimation of ARMA processes

9.09. # 1. Area inside the oval limaçon r = cos θ. To graph, start with θ = 0 so r = 6. Compute dr

b. Use the parametrization from (a) to compute the area of S a as S a ds. Be sure to substitute for ds!

Exercises to Statistics of Material Fatigue No. 5

Transcript:

Fisher Iformatio April 6, 26 Debdeep Pati Fisher Iformatio Assume X fx θ pdf or pmf with θ Θ R. Defie I X θ E θ [ θ log fx θ 2 ] where θ log fx θ is the derivative of the log-likelihood fuctio evaluated at the true value θ. Fisher iformatio is meaigful for families of distributio which are regular:. Fixed support: {x : fx θ > } is the same for all θ. 2. θ log fx θ must exist ad be fiite for all x ad θ. 3. If E θ W X < for all θ, the k k E θ W X θ θ. Regular families W xfx θdx Oe parameter expoetial families: Cauchy locatio or scale family: fx θ π + x θ 2 fx θ πθ + x/θ 2 k W x fx θdx θ ad lots more. Most families of distributios used i applicatios are regular..2 No-regular families Uiform, θ Uiformθ, θ +.

.3 Facts about Fisher Iformatio Assume a regular family.. E θ θ log fx θ. Here θ log fx θ is called the score fuctio Sθ. Proof. E θ θ log fx θ θ θ log fx θ fx θdx fx θ fx θdx fx θ fx θdx θ fx θdx θ sice fx θdx for all θ. 2. I X θ Var θ Proof. Sice E θ θ log fx θ. θ log fx θ Var θ θ log fx θ E θ θ log fx θ 2 I X θ. 3. If X X, X 2,..., X ad X, X 2,..., X are idepedet radom variables, the I X θ I X θ + I X2 θ + I X θ. Proof. Note that fx θ 2 f i x i θ i

where f i θ is the pdf pmf of X i. Observe that log fx θ θ i θ log f ix i θ ad the radom variables i the sum are idepedet. This [ ] Var log fx θ θ so that I X θ i I X i θ by 2. i [ ] Var θ log f ix i θ 4. If X, X 2,..., X are i.i.d ad X X, X 2,..., X, the I Xi θ I X θ for all i so that I X θ I X θ. 5. A alterate formula for Fisher iformatio is I X θ E θ 2 log fx θ θ2 Proof. Abbreviate fx θdx as f, etc. Sice f, applyig θ f f θ θ θ log f f. θ f f to both sides, Applyig θ agai, Notig that θ θ log f f [ ] θ θ log f f 2 f θ 2 log f f + θ log f θ f θ 3 f θ f f, θ log f f,

this becomes or 2 2 θ 2 log f f + θ log f f 2 E log fx θ + I θ2 X θ. Example: Fisher Iformatio for a Poisso sample. Observe X,..., X iid Poissoλ. Fid I λ. We kow I λ I X λ. We shall calculate I X λ i three ways. Let X X. Prelimiaries: Method #: Observe that Method #2: Observe that Method #3: Observe that fx λ λx e λ x! log fx λ x log λ λ log x! λ log fx λ x λ 2 λ 2 log fx λ x λ 2 [ I X λ E λ λ 2 ] [ Xλ 2 ] log fx λ E λ X X Var λ sicee EX λ λ λ VarX λ 2 λ λ 2 λ λ 2 λ I X λ Var λ λ X Var λ I X λ E λ X log fx λ Var λ as i Method#. λ 2 Xλ log fx λ E λ2 λ 2 λ λ 2 λ. 4

Thus I λ I X λ λ. Example: Fisher iformatio for Cauchy locatio family. Suppose X, X 2,..., X iid with pdf fx θ π + x θ 2. Let X,..., X, X fx θ. Fid I θ. Note that I θ I X θ I X θ. Now f θ log fx θ θ f 2x θ π+x θ 2 2 π+x θ 2 2x θ + x θ 2 Now [ I X θ E θ 2 ] log fx θ 2 2X θ E + X θ 2 2x θ + x θ 2 4 π x θ 2 + x θ 2 3 dx. 2 π + x θ 2 dx Lettig u x θ, du dx, I X θ 4 π 8 π u 2 + u 2 3 du u 2 + u 2 3 du. 5

Substitutig x / + u 2, u /x /2, du.5/x /2 /x 2 dx, Hece I θ /2. I X θ 8 π 8 π 8 π 4 π u 2 + u 2 3 du u 2 + u 2 + u 2 2 du xx 2 /2/x /2 /x 2 dx x /2 x /2 dx 4 x 3/2 x 3/2 dx Beta itegral π 4 Γ3/2Γ3/2 π Γ3/2 + 3/2 4.5 π 2 π 2! 2. 2 Uses of Fisher Iformatio Asymptotic distributio of MLE s Cramér-Rao Iequality Iformatio iequality 2. Asymptotic distributio of MLE s i.i.d case: If fx θ is a regular oe-parameter family of pdf s or pmf s ad ˆθ ˆθ X is the MLE based o X X,..., X where is large ad X,..., X are iid from fx θ, the approximately, ˆθ N θ, Iθ where Iθ I X θ ad θ is the true value. Note that Iθ I X θ. More formally, ˆθ θ Iθ Iθˆθ θ d N, 6

as. More geeral case: Assumig various regularity coditios If fx θ is a oeparameter family of joit pdf s or joit pmf s for data X X,..., X where is large thik of a large dataset arisig from regressio or time series model ad ˆθ ˆθ X is the MLE, the ˆθ N θ, I X θ where θ is the true value. 2.2 Estimatio of the Fisher Iformatio If θ is ukow, the so is I X θ. Two estimates Î of the Fisher iformatio I Xθ are Î I X ˆθ, Î 2 2 θ 2 log fx θ θˆθ where ˆθ is the MLE of θ based o the data X. Î is the obvious plug-i estimator. It ca be difficult to compute I X θ does ot have a kow closed form. The estimator Î2 is suggested by the formula I X θ E 2 log fx θ θ2 It is ofte easy to compute, ad is required i may Newto- Raphso style algorithms for fidig the MLE so that it is already available without extra computatio. The two estimates Î ad Î2 are ofte referred to as the expected ad observed Fisher iformatio, respectively. As, both estimators are cosistet after ormalizatio for I X θ uder various regularity coditios. For example: i the iid case: Î /, Î2/, ad I X θ/ all coverge to Iθ I X θ. 2.3 Approximate Cofidece Itervals for θ Choose < α < say, α.5. Let z be such that P z < Z < z α where Z N,. Whe is large, we have approximately IX θˆθ θ N, 7

so that or equivaletly, { P z < } I X θˆθ θ < z α P {ˆθ z I X θ < θ < ˆθ + z } I X θ α. This approximatio cotiues to hold whe I X θ is replaced by a estimate Î either Î or Î2: { } P ˆθ z Î < θ < ˆθ + z α. Î Thus ˆθ z Î, ˆθ + z Î is a approximate α cofidece iterval for θ. Here ˆθ is the MLE ad Î is a estimate of the Fisher iformatio. 3 Cramer-Rao Iequality Let P θ, θ Θ R. Theorem. If fx θ is a regular oe-parameter family, E θ W τθ for all θ, ad τθ is differetiable, the Var θ W {τ θ} 2 I θ. Proof. Prelimiary Facts: A. [CovX, Y ] 2 VarXVarY. This is a special case of the Cauchy-Schwarz iequality. It is better kow to statisticias as ρ 2 where ρ is the correlatio betwee X ad Y. CovX, Y VarX VarY 8

B. CovX, Y EXY if wither EX or EY. This follows from the well-kow formula. Sice E θ θ log θ, from B, we have f CovX, Y EXY EXEY. [Cov θ W, θ log f θ] E [ W θ log f θ ] W x log θ θ fx fx θdx θ fx W x θ dx W sice θ is a regular family θ x fx θdx fx θ E θw τ θ. Sice from A., we have [Cov θ W, θ log f θ]2 VarW Var θ log θ, f [τ θ] 2 Var θ W I θ. Remark. Equality i A. is achieved iff Y ax + b for some costats a, b. Moreover, if EY, the EaX + b forces b aex so that Y ax EX for some costat a. Applyig this to the proof of CRLB with X W, Y θ log f θ tells us that Var θ W {τ θ} 2 I θ 9

iff log θ aθ[w τθ] θ f for some fuctio aθ. is true oly whe fx θ is a pef ad W ct + d for some c, d where T is the atural sufficiet statistic of the pef. 4 Asymptotic Efficiecy Let X X, X 2,..., X. Give a sequece of estimators W W X. If EW τθ for all, the {W } is asymptotically efficiet if where Var θ W lim V θ. V θ {τ θ} 2 I X θ What if Var θ W or if W is biased? A alterative defiitio: A sequece of estimators {W } is asymptotically ormal if W τθ V θ d N,. as. {W } is asymptotically efficiet for estimatig τθ if W ANτθ, V θ. Example: Observe X, X 2,..., X iid Poissoλ. Estimatio of τλ λ: E X λ. Does X achieve the CRLB? Yes! Var X VarX CRLB {τ λ} 2 I X λ λ /λ λ Alterative: Check coditio for exact attaimet of CRLB. log fx λ λ i λ log fx i λ i Xi λ λ X λ

Note: Sice X attais the CRLB for all, it must be the best ubiased estimator of λ. Showig that a estimator attais the CRLB is oe way to show it is best ubiased. But see later remark. Estimatio of τλ λ 2 : Defie W T T / 2 where T i X i. EW λ 2 see calculatios below ad W is a fuctio of the CSS T. Thus W is best ubiased for λ 2. Does W achieve the CRLB? No!!! Note that CRLB {τ λ} 2 I X λ VarW 4λ3 + 2λ2 2 2λ2 /λ 4λ3. see calculatios below. Alterative: Show coditio for achievemet of CRLB fails. As show earlier: λ log fx λ Xi λ T λ i The CRLB is attaied iff there exists aλ such that T T T λ aλ 2 λ 2. But the left side is liear i T ad the right side is quadratic i T, so that o multiplier aλ ca make them equal for all possible values of T,, 2,.... Remark 2. This situatio is ot uusual. The best ubiased estimator ofte fails to achieve the CRLB. But W is asymptotically efficiet: 4λ VarW 3 lim CRLB lim + 2λ2 2 4λ 3 lim +. 2λ Calculatios: Suppose Y Poissoξ. simple patter: The factorial momets of the Poisso follow EY ξ EY Y ξ 2 EY Y Y 2 ξ 3 EY Y Y 2Y 3 ξ 4

Proof of oe case: EY Y Y 2 ii i 2 ξi e ξ i! i ξ 3 ξ i 3 e ξ i 3! ξ i e ξ ξ3 ξ 3 j! From the factorial momets, we ca calculate everythig else. For example: i3 i VarY Y E[{Y Y } 2 ] [EY Y ] 2 E[{Y 2 Y 2 }] [ξ 2 ] 2 E[ Y 4 + 4 Y 3 + 2 Y 2 ] ξ 4 [ξ 4 + 4ξ 3 + 2ξ 2 ] ξ 4 4ξ 3 + 2ξ 2 where Y k Y Y Y 2 Y k +. I our case T Poissoλ so that substitutig ξ λ i the above results leads to so that W T T / 2 satisfies: ET T λ 2 2 λ 2 Var[T T ] 4λ 3 + 2λ 2 4 3 λ 3 + 2 2 λ 2 EW λ 2 VarW 4λ3 + 2λ2 2. 4. A asymptotically iefficiet estimator Example: Let X,..., X be iid with pdf fx α xα e x, x >. Γα For this pdf EX VarX α. Clearly E X α. Thus X MOM estimator of α. Is it asymptotically efficiet? No. verified below. Note: This is pef with atural sufficiet statistic T i log X i. Sice T is complete, E X T is the UMVUE of α. Sice X is ot a fuctio of T, we kow Var X > Var[E X T ]. But Var[E X T ] CRLB. Thus, without calculatio, we kow that X caot achieve the CRLB for ay value of. We ow show it does ot achieve it asymptotically either. Note that Var X VarX 2 α.

Ad, [ Γ αγα {Γ α} 2 ] } I X α I X α {Γα} 2 by a routie calculatio. Hece CRLB I X α. Thus Var X CRLB αi X α which does ot deped o. Sice X does ot achieve CRLB for ay, we kow αi X α >. Thus Var lim X CRLB αi X α > so that X is ot asymptotically efficiet. The fuctio αi X α is a o-egative decreasig fuctio with lim α αi X α lim α αi X α. Figure : Plot of αi X α, where I X α is called the trigamma fuctio derivative of digamma fuctio: Γ α Γα 3

Whe α is small, X is horrible. Whe α is large, X is pretty good. Geeral Commet: For regular families, the MLE is asymptotically efficiet. iefficiet i geeral. Thus MOM is lim VarW CRLB essetially compares the variace of W with that of the MLE i large samples. 5 Fisher Iformatio, CRLB, Asymptotic distributio of MLE s i the multi parameter case Notatio: fx θ, θ θ, θ 2,..., θ p ad θ θ. θ p ad S p is the vector of scores log θ θ f θ log f θ. θ p log θ f Defie p p matrix I θ ES p S p Note that S is evaluated at θ ad the expectatio is take uder the distributio idexed by the same parameter θ. For a vector or matrix, we defie the expected values i this way: Y E Z EY ZZ W X E Y Z EW EX EY EZ 5. Properties. E θ S p p. 4

2. I S CovS, the variace-covariace matrix of S 3. If X, X 2,..., X has idepedet compoets, the I θ I X θ + I X2 θ + + I X θ. 4. If X, X 2,..., X are iid, the 5. I I θ I X θ. θ E 2 log θ where we defie θ f 2 2 2 log θ log θ θ2 f θ i θ j f which is the p pmatrix whose i, j etry is 2 θ i θ j log f θ. 5.2 Asymptotic distributio of MLE of θ If ˆθ ˆθ X, X 2,..., X is the sequece of MLE s based o progressively larger samples, the ˆθ ANθ, I θ where AN ow stads for asymptotically multivariate ormal. This meas ˆθ Nθ, I θ for large. Recall: I iid case I θ I X θ. Estimate I θ by I ˆθ or 2 log θ θ i θ j f θˆθ 5

5.3 Multi-parameter CRLB X has joit pdf pmf fx θ which is a regular family. θ θ, θ 2,..., θ p. If EW X τθ where τθ R is differetiable fuctio of θ i, i,..., p, the VarW X g I g where g τθ θ p ad I I Xθ p p. Special Case: W X ˆθ i with τθ θ i. That is, ˆθ i is a ubiased estimate of θ i. Now that vector g has g i ad g j for j i, ad the CRLB gives Varˆθ i I ii where the right had side is the ith diagoal elemet of I. Weaker result: Suppose we kew θ j for all j i. By fixig θ j for j i at the kow values, we get a oe-parameter family ad the CRLB for the oe-parameter case gives But, sice I ii I ii, Varˆθ i I ii I ii E Varˆθ i I ii I ii θ i log fx θ where the upper lower boud is the best you ca do if you are estimatig θ i ad all the other parameters are ukow, ad the lower lower boud is the best you ca do whe all the other parameters are kow. Example: Nµ, σ 2 ξ distributio. Note that fx µ, ξ 2πξ e x µ2 /2ξ. 2 ad ad Iθ E l log f 2 log fx θ θ 2 l 2 l µ 2 µξ 2 l 2 l ξµ ξ 2 log2πξ x µ2 2ξ µ log f ξ log f E ξ x µ ξ 2ξ + x µ2 2ξ 2 X µ ξ 2 X µ X µ2 ξ 2 2ξ 2 ξ 3 6 ξ 2ξ 2

Hece I ξ 2ξ 2 σ 2 2σ 4. For a ubiased estimate of µe µ,σ 2W µ, VarW σ2 For a ubiased estimate of σ 2, VarW 2σ4 ad S 2 σ2 χ2 so that VarS2 2σ4. The limitig distribute of the MLE is give by X µ ˆσ 2 AN σ 2 Note: achieved by W X. ot achieved exactly S2 is best ubiased, σ 2 2σ 4 Var Xi µ 2 2σ4 E Xi µ 2 σ 2. achieves the CR-boud, but ot legitimate estimator if µ is ukow. Example: Gammaα, β Recall the digamma fuctio ψα Γ α Γα. Note that fx α, β Γαβ α xα e x/β l log f log Γα α log β + α log x x/β. The log fx θ θ α log f β log f ψα log β + log X α β + X β 2 ad Hece Iθ E Iθ 2 l α 2 2 l βα 2 l αβ 2 l β 2 E β 2 α β 2 β αψ α β ψ α ψ α β β α 2X β 2 β 3 ψ α β α β αψ α β β 2 ψ α β α β 2 CRLB for ubiased estimator of β is give by Var ˆβ I θ 22 {Iθ 22}. 7

Note that I θ 22 β2 α ψ α ψ α /α, {Iθ 22} β2 α. If α is kow the lower lower boud is achieved X E β α X Var VarX α α 2 αβ2 α 2 β2 α. If α must be estimated, there is a variace pealty which does ot vaish asymptotically. Figure 2: Plot of ψ α ψ α /α, showig that it does ot become asymptotically 8