Fial: May 7 (W) 6:30 pm I WH 00E Notice the time chages! Itroductio to Statistics (Math 50) WH 00E MWF 8:30am-9:30am Office: WH 3 Office hours: M, T 3-4pm Textbook: Statistical Iferece (d ed.) by George Casella ad Roger L. Berger Chapter 6 - Chapter 0. Homework due: W i class. Quiz: 8:0am Every Friday, Midterm: March 0 (M) Fial: May 7 (W) 6:30 pm I WH 00E Notice the time chages! Each is allowed to brig a piece of paper with aythig o it. Homework assiged durig last week is due each Wedesday. It is o my website: http://www.math.bighamto.edu/qyu Remid me if you do ot see it by Saturday morig! Homework due this Friday is o my website!!! The solutio is o my website. Grade yourself carefully ad had i. Gradig Policy: 50% hw ad quizzes +50% exams B = 70 ± Chapter 0. Itroductio Questio: What is Statistics? Oe ca use the followig example to explai i short. Example (capture-recapture problem). I a pod, there are N fish. Catch m, say m = 0, tag them ad put them back. Re-catch fish, say = 0, X of them { are tagged, say X = 3. P(X = x) =? probability problem Questio: N =? statistics problem. Aswer:. f(x;n) = P(X = x) = x)( (m N m x), x {0,,..., m}, m < N. ( N ). May estimates of N: MLE, MME, LSE, etc. e.g. MLE: Ň = argmax N f(3;n) => X = E(X) = m/n => ˆN = m/x = 33 3. Properties of these estimators? What is the best estimator? Typically, statistics deals with such problems: Give a radom sample, say X,..., X, i.i.d. from X, assumig they are from a model with cdf F(x;θ), where θ ukow i Θ fid out: θ =? or P(X x) =? (This is called poit estimatio). What is θ i the capture-recapture problem? We shall study. how to summarize X,..., X,. how to fid a formula to guess θ based o the summary, 3. whether the guess with such a formula is good. Ň = 40. MME: Solve Chapter 6. Priciples of Data Reductio Deote X = (X,...,X ), where X,..., X, i.i.d. from X F(x;θ). We call X a data set or observatios from X. Oe ca use R to geerate data set i simulatio: > x=rorm(3,0,)
[] 0.363466 0.4865695-0.63855 > x=rexp(30,3) # 3=E(X) or /E(X)? ( f(x) e x/µ = e ρx, x > 0). > mea(x) [] 0.3559676 Defiitio. Give data X, a statistic T(X) is a fuctio of X, where T( ) does ot deped o θ. A data set is ofte quite large, for estimatio purpose, it is desirable to simplify it to a statistic. However, we do ot wat to lose iformatio durig the simplificatio. This is called data reductio. Several priciples for data reductio: () sufficiecy priciple, () likelihood priciple (maybe igored i the lecture), (3) ivariacy priciple (maybe igored i the lecture).. Sufficiecy Let X be a radom vector (cotiuous or discrete), with the desity fuctio (d.f.) f X (x;θ). Defiitio. If T(X) is a statistic ad the coditioal distributio of X give T, say (X T), is idepedet of θ, the T is a sufficiet statistic for θ (or we say that T is sufficiet for θ). Sufficiecy priciple: Reduce the data to a sufficiet statistic. Theorem. (Factorizatio theorem). Let f be the d.f. of X, ad T(X) a statistic. T is sufficiet for θ iff f(x;θ) = g(t(x);θ)h(x), (x,θ) where h does ot deped o θ. () Recall that a family of distributios, say {f( ;θ) : θ Θ}, is said to belog to a expoetial family if f(x;θ) = h(x)c(θ)exp( k w j (θ)t j (x)), where h ad t i s are idepedet of θ ad c ad w i s are idepedet of x. Theorem. If X,..., X are i.i.d. from a expoetial family, ad if T(X) = i= (t (X i ),...,t k (X i )), the T is sufficiet for θ. Remark. 3 methods for determiig a sufficiet statistic: () Defiitio. fx T(x) is idepedet of θ. () Factorizatio Th. f(x;θ) = g(t(x);θ)h(x). (3) Expoetial family. T(X) = i= t(x i), t = (t,...,t k ). Method (3) is most coveiet, but ot always work. Why? Method () is ot coveiet, but always works. Method () is coveiet most of the time ad always works. Example. Suppose that X,..., X ( 3) is a radom sample from bi(,θ). Are T(X) sufficiet for θ i the followig cases? (a) T = X, (b) T = i= X i +, (c) T = X, (d) T = X +X, (e) T = X +θ. Sol. (a) : T(X) = X is sufficiet for θ. By (). P(X = x T(X) = y) = (x = y) is idepedet of θ. By (). f X (x;θ) = i= f X(x i ;θ) = f X (x i ;θ). i= }{{} g(t(x);θ) }{{} h(x) j=
Way(3)isotapplicablethoughbi(,θ)belogstoaexpoetialfamily, as i= t(x i)(= i X i)isdimesioal, but T = X is dimesioal. (b) : T(X) = i X i + is sufficiet for θ. By (). : f X T is idepedet of θ. P(X = x T = t) = P(A B) = P(AB)/P(B). i X i bi(,θ). P(B) = P( i X i + = t) = P( i X i = t ). P(AB) = P(X = x, i X i + = t) = ( i x i = t )P(X = x) = ( i x i = t )θ t ( θ) t+ why? Thus P(X = x T = t) = ( xi=t ) i is idepedet of θ. ( t ) By (). f X (x;θ) = θ i xi ( θ) i xi = θ T(x) ( θ) T(x)+ }{{}}{{}. g(t(x);θ) h(x) By (3). : T = i= t(x i) = i X i +, where f X (x) = h(x)c(θ)exp(w(θ)t(x)). f X (x;θ) = θ x ( θ) x = ( θ θ )x ( θ) = ( θ θ )x+/ ( θ θ ) / ( θ) = ( θ θ ) / ( θ)exp((x+/)l θ θ ). Thus k =, w (θ) = l θ θ ad t (x) = x+/. It yields T = i= t (X i ) = i X i +. (c) : X is sufficiet for θ. X = = X i (= (( i X i +) /)), a liear combiatio of T i (b)). The proof is similar to that i (b). (d) : X +X is ot sufficiet for θ. Choose a couterexample: P(X = (0,,0,) T = 0) = P(X= =X =0,X=) P(X =X =0) = P(X 3 = = X = 0,X = ) = θ( θ) ( ) = It depeds o θ. Q: Ca we use methods () ad (3)? Why? (e) : X +θ is ot a sufficiet statistic for θ. Reaso: T depeds o θ, Proof? Let = 3 ad X i =, the T(X) = thus, it is ot eve a statistic, let aloe a sufficiet statistic. { 0 if θ = 0 0.5 if θ = 0.5. { if θ =.5 if θ = 0.5 Remark. Sufficiet statistics are ot uique or equivalet. X, X, i= X i + are all sufficiet for θ. Which is preferred? Remark. 3 methods for determiig a sufficiet statistic: () Defiitio. () Factorizatio Th. (3) Expoetial family. Example. Does the family of distributios belog to a expoetial family i the followig cases? (a) N(µ,σ ), (b) bi(m, p), (c) Poisso with mea µ (P(µ)), (d) Exp(θ) with mea /θ, (e) Double expoetial distributio f = λ exp( x µ λ ), (f) U(a,b). Sol. Yes for (a) through (d), explaied i 50. No for (e) ad (f). Reaso for (e) is explaied i 50. 3
Reaso for U(a,b): Note that θ = (a,b) is the parameter. If it belogs to a expoetial family, say k f X (x;θ) = h(x)c(θ)exp{ w j (θ)t j (x)} = j= (x (a,b)) b a () the it is impossible that f X (x;θ) = (x (a,b)) b a = 0 for all x / [a,b], as h ad t i s are idepedet of (a,b). Doe?? Give a couterexample: If Eq.() holds, for (a,b) = (0,), f X (x;θ) = (x (a,b)) b a = / for x (0,)yields h(x) > 0 for x (0,) Why?? but for (a,b) = (0,) f X (x;θ) = (x (a,b)) b a = 0 for x > yields h(x) = 0 for x (,) Why?? A cotradictio. Thus it does ot belog to a expoetial family. Example 3. Let X,..., X be i.i.d. from N(µ,σ ). Fid a sufficiet statistic for θ i the followig cases. (a) µ kow, (b) σ kow,(c) both ukow. Sol. f(x;θ) = πσ exp( (x µ) σ ) = πσ exp( x σ + xµ σ µ σ ) f X (x;θ) = h(x)c(θ)exp{ k w j (θ)t j (x)} (a) θ = σ, T = i (X i µ) or (X µ), etc. Why? (b) θ = µ, T = i X i or X. (c) θ = (µ,σ ), T = ( i X i, i X i ), etc. Example 4. Fid a (o-trivial) sufficiet statistic for U(0,b) if X,..., X is a radom sample from U(0,b). Sol. Questio: What is a trivial sufficiet statistics fo U(0,b)? Questio: Ca we use Method (3)? No! U(0,b) does ot belog to a Expoetial family. Questio: Ca we use Method ()? Not coveiet! as we have o idea o what is T. Method () is a good tool for fidig a suitable sufficiet statistic for o-expoetial family. Correct approach: f X (x;θ) = f X (x i ;θ) = i= i= j= b, x,...,x (0,b)?? f X (x;θ) = i= (x i (0,b)) b = b (x,...,x (0,b)) }{{} }{{} h(x) g(t(x);b) = b (x (),x () (0,b)) }{{} }{{} h(x) g(t(x);b) = b (x () < b) (x() >0) }{{}}{{} g(t(x);b) h(x) (x () x () x () order statistics) Sufficiet statistics: (a) T = X (trivial oe), (b) T = (X (),X () ), 4
(c) T 3 = X (). Which oe you prefer? To fid a sufficiet statistic, it is ot coveiet to use the defiitio. See from the examples, the dimesio of a sufficiet statistic ca be or smaller. It is desirable to fid a sufficiet statistic that has the smallest dimesio. Defiitio. A sufficiet statistic T is called a miimal sufficiet statistic (MSS), if for ay other sufficiet statistic T, T is a fuctio of T. A MSS = a sufficiet statistic with the least dimesio? Cosider the case that =, X ad X are i.i.d. from U(θ,θ+), θ is ukow. T = (X,X ) ad T = (X (),X () ). Which is likely a MSS? Theorem 3. Suppose that () f(x;θ) is the desity fuctio of X; () T(X) is a statistic; (3) f(x;θ)/f(y;θ) is idepedet of θ iff T(x) = T(y) (x,y). The T is MSS. Questio. How to get the desity fuctio f X for a radom sample from f X? Remark. Two ways to determie a MSS.. Defiitio.. Theorem 3. Example 5. Let X,..., X be a radom sample from N(µ,σ ), where θ = (µ,σ ). Fid a MSS for θ. Sol. A sufficiet statistic is T(X) = (X,X ). To show that it is MSS, the defiitio is ot coveiet, we use Theorem 3. Sice f(x;θ) exp( x σ + xµ σ µ σ ) x y σ + (x y)µ σ ) = θ iff T(x) = T(y), (x,y). f(x;θ)/f(y;θ) = exp( Thus T is MSS. Example 6. Suppose thatx,..., X are i.i.d. U(θ,θ +). Fid a MSS for U(θ,θ +). Sol. () Fid a suitable sufficiet statistic; () Show that it is MSS. f X (x;θ) = (x i (θ,θ +)) i= =(θ < x (),x () < θ +) }{{}}{{}, θ (, ) h(x) g(x (),x () ;θ) T = (X (),X () ) is sufficiet for θ by the Factorizatio theorem. : T is MSS. That is, f X (x;θ) f X (y;θ) = (θ < x { (),x () < θ +) if T(x) = T(y) (θ < y (),y () < θ +) = depeds o θ if T(x) T(y) where 0 def 0 =. It suffices to show to show f X (x;θ) (y;θ) is ot costat i θ, if T(x) T(y). f X T(x) T(y) implies either () x () < y () (or x () > y () ), or () x () < y () (or x () > y () ). By symmetry betwee x ad y, we just eed to cosider either () x () < y () or () x () < y (). By symmetry betwee (x,y) ad (x,y), we just eed to prove case () x () < y (). { f X (x;θ) 0 f X (y;θ) = 0 if y () < θ 0 if x () < θ < y () ad y () < θ + doe? Need to give θ s 5
Sice f(x;θ) = (x (θ,θ+)), x () x () (0,) ad y () y () (0,), (x () y () ) (x () y () ) (0,) why? 0 0 if θ = y () + Why? f X (x;θ) 0 f X (y;θ) = if θ = x ()+y () (as x () < θ < y () ad y () < θ +) () due to 0 < (y () x () ) (y () x () ) < => y () x () < θ < y () x () < θ + => It depeds o θ if x () < y (). Thus f X (x;θ) f (y;θ) is idepedet of θ iff T(x) = T(y) (x,y). X Defiitio. Suppose that {f(x;θ) : θ Θ} is a family of desity fuctios, X,..., X are i.i.d. from f. T = T(X) is a statistic. T is said to be acillary if f T does ot deped o θ. T is said to be a complete statistic or complete for θ (or for the distributio family), if fuctio g such that g(t) is a statistic, we have E(g(T)) = 0 θ P(g(T) = 0) = θ. Theorem 4. If a MSS exists, the each complete statistics is MSS. Theorem 5. Suppose () X,..., X are i.i.d. from f(x;θ), θ Θ R k. () f = h(x)c(θ)exp( k j= w j(θ)t j (x)); write w = (w,...,w k ); (3) {w(θ) : θ Θ} cotais a o-empty ope set of R k ; the T = i= t(x i) is complete; where t = (t,...,t k ). Q: Are w ad t uiquely determied? Remark. Two ways to determie whether T is complete:. Defiitio;. Expoetial family by Theorem 5. Example 7. Let X,..., X be i.i.d. from X. Is T(X) complete for θ? (a) T = (X,X ), where X N(µ,σ ) ad θ = (µ,σ ). (b) T = (X,X ), where X N(θ,θ ). (c) T = X (), where X U(0,θ). Sol.(a) Expoetial family. {w(θ) : θ Θ}=? Notice f exp( σ x + µ σ x µ σ ) exp( σ x /+ µ σ x/) (w,w ) = ( σ, µ σ ) = ( σ, µ σ ), µ (, ), σ (0, ), Why factor? (check Th 5). {w(θ) : θ Θ} = (,0) (, ). Questio: Why? It follows that {w(θ) : θ Θ} cotais a o-empty ope set i R. Thus by Theorem 5, T is complete. Remark. Notice that T is also MSS by Example 5. (b) Q: {w(θ) : θ Θ} = (,0) (, )? (w,w ) = ( θ, θ ), θ > 0. {w(θ) : θ Θ} = {(w,w ) : w / = (w /),w > 0} is a curve i R. It does ot cotai a ope set i R. Coditio (3) i Theorem 5 does ot hold. Caot use Theorem 5, as it oly gives sufficiet coditio for completeess. : (X,X ) is ot complete for θ. Use the defiitio. Need to costruct a g such that E(g(T)) = 0 but P(g(T) = 0) <. How? Notice that () E(X) = θ, () E((X) ) = µ X +σ X = θ +θ / = (+ )θ, (3) E(X ) = E(X ) = σ +θ = θ. Now from () ad (3), settig g(t) = (X) cotiuous. Thus T is ot complete. +/ X. Verify E(g(T)) = 0, but P(g(T) = 0) = 0 < as g(t) is 6
Questio: Is T MSS? Yes, by Example 5. Remark. This is a example that T is MSS but it is ot complete. (c) Claim: T is complete. U(0,θ) does ot belog to a expoetial family, thus use the defiitio. Need to compute E(g(T)) = g(t)f T (t)dt. f T =? Formula: f X() (t) =! ( )!! (F X(t)) (f X (t)). f T (t) = t /θ, t (0,θ). (Or derive it directly as follows. f T (t) = F T (t). F T (t) = P(T t) = P(X () t) = P(X t,...,x t) = P(X t)) P(X t) = (F(t)) = t /θ, t (0,θ). f T (t) = t /θ, t (0,θ).) E(g(T)) = θ g(t)t 0 θ dt = 0 θ > 0, Does it imply P(g(T) = 0) =? Aswer. Yes, as h(t) = g(t)t /θ = 0 a.e. (or h(t) dt = 0), i.e., g(t) = 0 a.e. by the lemma as follows. Lemma. If y h(t)dt = 0 y > 0, the h(t) = 0 a.e.. 0 Note (t {,})dt = 0, but it is ot true that (t {,}) 0). The proof of Lemma is a exercise i Real Aalysis ad is quite log. We cosider oe that is easy to prove (though ot quite precise). Lemma. If h is cotiuous ad x h(t)dt = 0 x > 0, the h(t) = 0. 0 Proof. ( x 0 h(t)dt) = h(x) = 0 x > 0. Note that g(t)t may ot be cotiuous. e.g., g(t) = (t {,}). Recall that if f T does ot deped o θ, the statistic T is called acillary. Basu s Theorem. If T(X) is a complete ad MSS statistic, the T(X) U(X), acillary statistic U(X). Example 8. Suppose that X,..., X is a radom sample from U(θ,θ + ), T = X () X (). Show that T is acillary. Sol. Two ways to check () Direct. () Pivotal method. Direct Way (): Derive f T (t) by cdf or Jacobia method. (a) By cdf: P(X () X () t) = f X(),X () (x,y)dxdy = = = = y x t y x t y x t v u t! (x < y)!! ( )! f(x)f(y)(f(y) F(x)) dxdy f(x) = (x (θ,θ +)) }{{} similar to triomial dist ( )(θ < x < y < θ +)(y x) dxdy ( )(0 < u < v < )(v u) dudv (u = x θ v = y θ) v u t,0<u<v< ( )(v u) dudv { 0 if t 0 = v t ( )(v u) dudv if t (0,] t 0 if t > why? 7
Note that θ disappears, thus f T (t) = F T (t) is idepedet of θ. T is acillary. (b)byjacobia. (T,W) = (X () X (),X () ),f T,W (t,w) = f X(),X () (w t,w) J. J =? f T (t) = f T,W (t,w)dw =... f T (t) = f X(),X () (w t,w)dw..., where (x < y)! f X(),X () (x,y) =!! ( )! f(x)f(y)(f(y) F(x)) f(x) = (x (θ,θ +)) =( )(θ < x < y < θ +)(y x) =... () Pivotal method: That is, give f X ( ;θ), fid a pivotal Z = g(x,θ) such that the desity f Z is idepedet of θ. Typical pivotals are related to the locatio-scale family: X θ if f X (x;θ) = f(x θ) Z = X/θ X µ λ The f Z (t) = f(t). if f X (x;θ) = f(x/θ)/θ if f X (x;θ) = f( x µ λ )/λ. f X (x) = (x (θ,θ +)) = (x θ (0,)) = f }{{} Z (x θ). =?? pivatol where Z = X θ is called a pivotal, ad f Z (t) = (t (0,)). To prove T is acillary, eed to show : (i) f Z is idepedet of θ : (ii) T = X () X () = Z () Z (). The F T (t) = P(X () X () t) = P(Z () Z () t) = A f Z (),Z ( (x,y)dxdy where A = {(x,y) : y x t} ad f Z(),Z ( (x,y) =!! )!! (f Z(x)) (F(y) F(x)) (f Z (y)) dxdy : (i) f Z is idepedet of θ There are two approaches to prove Z is a pivotal as well: (a) cdf ad (b) df. Approach (a). Sice F Z (t) = P(Z t) = P(X θ t) { 0 it t+θ < θ = P(X t+θ) = (t+θ) θ it θ t+θ < θ + { it t+θ θ + 0 if t < 0 = t if t [0,). if t f Z (z) = i= (z i (0,)) is idepedet of θ, ad T is acillary. Approach (b). f Z (z) = f X (g (z)) g z where z = g(x) = x θ. g (z) = z +θ ad g z =. Thus f Z (z) = f X (g (z)) = (z +θ (θ,θ +)) = (z (0,)). f Z (z) = i= (z i (0,)) is idepedet of θ, T is acillary. Example 9. Let X,..., X be a radom sample from X f(x;θ), where f = θ f o( x θ ), θ > 0 ad f o is a desity fuctio, i.e., f o (x)dx =, ad f o 0. Show that T is acillary i the two cases: () T = X S, where S = i= (X i X) ; () T = X X. Sol. () Two ways as i Ex 8. Use the simpler way. Sice f(x) f o ( x θ ), Z = X θ is a pivotal. f Z (z) = f X (g (z)) g z (g =?) = θ f o(z)θ = f o (z). Let Z i = Xi θ. The T = X S = Z S Z where S Z = i= (Z i Z). F T (t) = P( Z S Z t) = A f Z (z)dz, where A = {z : z S z t}. 8
f Z (z) = i= f o(z i ) is idepedet of θ. Thus T is acillary. () T = X. Thus T is acillary too. = Z X Z Example 0. Let X,..., X be i.i.d. Exp(θ), where E(X i ) = θ. Let U(X) = (X /X). E(U) =? Sol. Usual way E(U) = xf U (x)dx = (x/y) f (x,y)dxdy... X,X Aother way: Make use of Basu s Theorem. If T(X) is a complete ad MSS statistic, the T(X) U(X), acillary statistic U(X). T is said to be acillary if f T does ot deped o θ. Recall Example 9. Let X,..., X be a radom sample from X f(x;θ), where f = θ f o( x θ ), θ > 0 ad f o is a desity fuctio, We show that T = X /X is acillary. Note that f X (x) e x/θ. Thus U(X) = (X /T(X)) is a acillary statistic. How to show it? Let T(X) = X. The T is a complete ad MSS statistic. Why? E(X) = E(UT ) = E(U)E(T ) (by Basu Theorem) => E(U) = E(X)/E(T ). θ +θ = E(U)( (θ +(θ) ). θ E(U) = (θ +(θ) ) Remark. If X U(0,) E(U) = E(X/(X) ) = E(X)/E((X) )??? Chapter 7. Poit estimatio. Defiitio. A poit estimator is a statistic. Its values are called estimates. We shall discuss methods of estimatio ad their optimal properties. 7.. Methods of estimatio 7... Methods of momets estimator (MME) Suppose that X,..., X are i.i.d. from X f(x;θ), θ = (θ,...,θ k ) Θ. A MME of θ is a solutio of θ to equatios X i = E(X i )... X i k = E(X i k ), where i,..., i k are distict itegers Questio: Where is θ i these equatios? I particular, a MME is a solutio to X i = E(X i ), i =,..., k. Remark. The solutio to the MME is ot uique. Example. Suppose that X bi(,p), θ = p. MME of θ? Sol. We preset two solutios, deoted by ˆp ad p. () X = µ X with k =. X = p Why? ˆp = X /. Questio: Why do ot say MME is p = X /? () X = E(X ) with k =. X = σ +µ = p( p)+(p) = p+( )p that is, X +p+( )p = 0 p = ± +4( )X ( ) Questio: Two solutios. Are they both MME? Aswer: p = + +4( )X ( ) Example. Suppose that X,...,X are iid. from bi(,p), θ = p. MME of θ? Sol. Two approaches: () Stadard, () MSS. T = i= X i. () X = p => ˆp = X. () T = E(T) => T = p => ˆp = X. 9
Example { 3. Suppose that X,..., X are i.i.d. from N(µ,σ ), θ = (µ,σ). MME of θ? X = µ Sol. X = µ +σ {ˆµ = X ˆσ = X (X) 7... Maximum likelihood estimator (MLE). Assume that f X (x;θ) is the desity fuctio of X, where θ Θ. Write L(θ) = f X (x;θ) ad call it the likelihood fuctio of θ. The value of θ that maximizes L( ) over all possible θ i Θ is call the MLE of θ. ˆθ = argmax θ Θ L(θ) Iterpretatio: Give x, the MLE chooses θ such that the probability that X x is the largest f X (x) { = P(X = x) if X is discrete P( X x <ǫ) (ǫ) if X is cotiuous Typical steps for the MLE with differetiable L: Step. Solve for critical poits of ll (i.e., all t s such that (ll) (t) = 0 or L (t) does ot exist, or the boudary). Step. Check whether t is the maximum poit by either the secod derivative test if L exists everywhere, or comparig the value L(t) over all t obtaied i step. Example. Suppose that X,..., X are i.i.d. from N(θ,). Fid the MLE of θ i the followig cases: (a) Θ = (, ), (b) Θ = [0, ),(c) Θ = [,]. Sol. Deote ll(θ) = l i= f X(X i ;θ) = l{ exp( (X i θ) (π) / i= )} = l (π) / i= (X i θ) Remark. It is much clearer by drawig the graph of y = ll(x). A parabola cocavig dow. (a) Θ = R. ll(θ) = i (X i θ) = 0 θ = X. Check: (ll) exists o R, ad (ll) < 0. Thus ˆθ = X is the MLE. (b) Θ = [0, ). Possible critical poits: θ = X, 0,. Check: Two cases: () X > 0, () X 0. () () critical pts : 0 X ll( ) c X c (X (X) ) (ll( )) + 0 MLE critical pts : 0 ll( ) fiite MLE Thus the MLE ˆθ = max{0,x}. (c) Θ = [,]. Possible critical poits: θ = X,,. Check: 3 cases: () X (,), () X, (3) X. critical poits : X () ll( )??? simple if X is give (ll( )) + 0 MLE critical poits : () ll( )?? (ll( )) MLE Do we eed both? 0
(3) critical poits : ll( )?? (ll( )) + + Thus the MLE ˆθ = MLE { X if X [,] if X < if X >. Example. Suppose that X,..., X are i.i.d. from bi(k,p) where p is kow, p (0,), ad k is ukow. MLE of k? Solutio. Questio: What is Θ? L = ( ) k p Xi q k Xi = ( i= X i i= X i! )(p q ) i Xi ( i= (k X i )! )(k!) q k Remark. If X () = 0, L = q k is maximized by k =. Thus ˆk = if X () = 0. WLOG, assume X (). Questio: Should we use the typical method? i.e., ll(k) k = 0? () ll(k) k (k!) =? () θ = k is discrete, the root of k may ot be a iteger. Notice that X,..., X k. Thus the MLE ˆk max{x (),}. Oe method: Guess ad try. The MLE ˆk =argmax k X() L(k). A R program i the special case of (,p,x ) = (,0.8,5): X=5 p=0.8 N=0 K=X:N f=choose(k,x)*p**x*(-p)**(k-x) ( = ( ) k X p X q k X ) F=max(f) roud(f,3) roud(f,3) [] 0.393 [] 0.38 0.393 0.75 0.47 0.066 0.06 0.00 0.003 0.00 0.000 0.000 0.000 [3] 0.000 0.000 0.000 0.000 (ˆk =??) The the MLE is ˆk = 6, Why? accordig to f(5),...,f(0). K[f==F] [] 6 Remark. Drawback of this approach: It is ot clear that ˆk is the MLE, as we oly list k {5,6,...,0}. Secod approach: Cosider g(k) = L(k) L(k ). e.g. let =, the g(k) = k k X q = X q decreases from to q (< ) for k [X /k (), ); X () =?? => () g(ˆk) ad () g(ˆk +). Why? q () X => q X /k /k => X /k p (= q) => X /p k. q () X => q X /(k+) /(k +) => X /(k +) p => X /p k +. Thus X X p k p. If (,p,x ) = (,0.8,5), the 5.5 ˆk 6.5 => ˆk = 6. Why? Now i geeral, ( k! i= X g(k) = )p i Xi i!(k X i)! q k i Xi ( k i Xi i= q (k ) = ( )q i Xi k X i= i (k )! X )p i!((k ) X i)! g(k) = L(k) L(k ) = ( X i /k )q, k (X () ), where 0 =. () i= WLOG, assume X (). X i/k i k [X (), ), i. The g(k) decreases from to q (< ) o [X (), ). ()
{ By Eq. (), at the MLE ˆk, L(ˆk ) L(ˆk) L(ˆk +) L(ˆk). i.e., { g(ˆk) g(ˆk +). Statemet () says y = g(x) is a decreasig curve that crosses y =. We should look for ˆk X () such that g(ˆk) ; g(ˆk +). Q: g(ˆk) =?? The MLE ca be writte as ˆk = max{k : g(k),k X () }? ˆk = mi{k : g(k),k X () }? ˆk = mi{k : g(k),k X () }? Give a data set, we ca solve it easily: Solve y = g(x) ad y =, x {X (),X () +,...}; or solve y = i= ( X i/k) ad y = q, as g(k) = ( i= X i/k )q. (I) draw graph y = i= ( X i/k) ad y = q, x {X (),X () +,...}. (II) fid their solutio ˆx ad ˆk = max{k : k ˆx} g 0 5000 0000 5000 0000 5000 30000 0 30 40 50 60 k The R program: p=0.6 =6 x=rbiom(,0,p) # simulatio to get data x m=max(x) if (m==0) h= if (m>0) { j=4*m k=m:j g=rep(0,(j-m+)) q=(-p)** # q** for(i i m:j) g[i-m+]=q/prod(-(x/i)) #g h=mi(k[g<=])- # or use H=max(k[g>=]) }
h H I ra the program 3 times ad got 6,, 9. Why 3 values? True k? The revised R program: p=0.6 =6 x=rbiom(,0,p) m=max(x) if (m==0) h= if (m>0) { j=4*m k=m:j g=rep(0,(j-m+)) for(i i m:j) g[i-m+]=prod(-(x/i)) #/g q=(-p)** # q** plot(k,g,type= l ) # ot ecessary lies(c(m,j),c(q,q)) # ot ecessary h=mi(k[g>=q])- # or use H=max(k[g<=q]) } h H Theorem. (Ivariace property of the MLE). If ˆθ is the MLE of θ ad τ = g(θ) is a fuctio of θ, the the MLE of τ is ˆτ = g(ˆθ). Example 3. Let X,..., X be a radom sample from N(µ,σ ), θ = (µ,σ ), θ Θ = [0, ) (0, ). Fid the MLE of µ, σ, σ ad E(X ). Sol. Let τ = σ ad γ = E(X ). MLE of µ, τ, σ ad γ? (σ,γ) are fuctios of θ. First get the MLE ˆθ, the (ˆσ,ˆγ) ca be obaied by the ivariace property of the MLE. L = f(x i ;θ) = (πσ ) / exp( σ (X i µ) ), i= i L = (πτ) / exp( (X i µ) ), why?? τ i ll = c lτ (X i µ), () τ i ll µ = (X i µ) = 0 µ = X, τ i ll = τ τ + τ (X i µ) = 0 τ = (X i µ) Doe? () i Check: Two ways: (A) oe-by-oe, (B) Two dimesios. (A). Fix τ, maximize ll(µ,τ) w.r.t. µ, say µ = g(τ). The maximize ll(g(τ),τ) w.r.t. τ. The MLE is (g(ˆτ),ˆτ). Now ll is maximized by ˆµ = 0 X, (see Example i 7..) regardless τ. That is g(τ) = ˆµ. Replacig µ by ˆµ i L(µ,τ) ad Eq. (), the critial poits for τ: 0 ˆτ logl(ˆµ,τ) fiite (see Eq.()) MLE 3 i=
Thus the MLE of (µ,τ,σ,γ) is ˆµ = 0 X, ˆτ = i= (X i ˆµ), ˆσ = ˆτ as τ = σ ; ˆγ = (ˆµ) + ˆσ as E(X ) = µ +σ. ˆγ = (ˆµ) + ˆσ = (ˆµ) + (X i ˆµ) i = (ˆµ) + (Xi X iˆµ+(ˆµ) ) i = (ˆµ) +X X ˆµ+(ˆµ) = X why? (B). () Critical poits of L(µ,τ): for µ: X (if X > 0), 0,. for τ: i= (X i µ), 0,, (.a) Compare L(µ,τ) over critical poits if X > 0. (µ,τ) [0, ) (0, ). ll θ = 0: µ = X ad τ = i= (X i µ) = i= (X i X). 4 boudary lies ad a poit. µ = 0, µ =, τ = 0, τ = ad (X,X (X) ). µ = 0 reduces to (0,0), (0, ) ad (0,X (0) ), or oly the latter?? (µ,τ) µ = 0 }{{} (0,X (0) ) (X,X (X) ) µ = τ = 0 τ = ll = c lτ τ i (X i µ) fiite fiite ll µ (µ,x (X) ) + 0 MLE (.b) compare L(µ,τ) over critical poits (µ,τ) if X 0(oly 4 boudary lies): (µ,τ) µ = τ = 0 τ = (0,X ) <= µ = 0 ll f iite MLE 7..3. Bayes estimator. We have leared two estimators: MME ad MLE uder the assumptio that X,..., X are i.i.d. from f(x;θ), θ Θ. θ is a costat (ot radom), ukow. I this sectio, we cosider Bayesia approach: Coditioal o θ, X,..., X are i.i.d. from f(x θ), θ is a radom variable with df π(θ), f(x θ) is a coditioal df of X θ. Bayes estimator of θ is ˆθ = E(θ X). Recall the formula f X Y (x y) = f(x,y) f Y (y). () Now f(x,θ) is the joit df of (X,θ), f X (x) is the margial df of X, π(θ) is the margial df of θ, called prior df ow, f(x θ) is the coditioal df of X θ, π(θ x) is the { coditioal df of θ X, called the posterior df ow, f(x,θ)dθ if θ is cotiuous f X (x) = θf(x,θ) if θ is discrete. 4
{ f(x,θ)dx if X is cotiuous π(θ) = xf(x,θ) if X is discrete. f(x θ) = f(x,θ) π(θ) by Eq. (), π(θ x) = f(x,θ) f (x) by Eq. (), X E(θ X = x) = { θπ(θ x)dθ if θ is cotiuous θ θdf(x,θ) f X (x) = θπ(θ x) if θ is discrete. Recall the Bayes set-up: coditioal o θ, X,..., X are i.i.d. from f(x θ), Are X i s i.i.d.? Homework. Aswer it through the assumptio as follows. Let X,..., X be i.i.d. bi(,p), ad p U(0,). As: No! Remark. Two ways to compute the Bayes estimator:. E(θ X),. E(θ T(X)) where T is a MSS. They lead to the same estimator. The secod method is ofte simpler i derivatio. Example. Let X,..., X be a radom sample from bi(k,θ), θ beta(α,β) with π(t) = tα ( t) β B(α,β), t [0,], where B(α,β) = Γ(α)Γ(β) Γ(α+β), α,β > 0 ad (k,α,β) is kow. Bayes estimator of θ? Sol. Recall T(X) = i= X i is MSS if θ is a parameter. () E(θ X)=? () E(θ T(X))=? Method. Based o X. f(x θ) = ( k ) i= x i θ x i ( θ) k xi = ( ( k ) i= x i )θ i xi ( θ) k i xi. π(θ x) = f(x,θ) f X (x) = f(x θ)π(θ) f X (x) θ i xi ( θ) k i xi θ α ( θ) β (mai trick!!) = θ xi+α i ( θ) k xi+β i () Thus θ (X = x) beta( i x i +α,k i x i +β) (= beta(a,b)), The Bayes estimator is ˆθ = E(θ X) = a a+b = i X i +α k +α+β i X i + α+β k k +α+β k = k +α+β i= = r X i α +( r) k α+β α α+β { MLE if r E(θ) if r 0 a weighted average of the MLE i= Xi k ad the prior mea α α+β. Method. Based o MSS T = i X i. T θ bi(k,θ)? or T bi(k,θ)? f T θ (t θ) = ( k t π(θ t) = ( k ) t θ t ( θ) k t θ α ( θ) β /B(α,β) f T (t) ) θ t ( θ) k t, θ t+α ( θ) k t+β same as (), why?... Example. Suppose that X,..., X is a radom sample from N(θ,σ ), θ N(µ,τ ), where (σ,µ,τ) is kow. Bayes estimator of θ? 5
Sol. f X (x) = πσ exp( ( x µ σ ) /) e ax +bx (kerel of f). Two ways: () E(θ X) ad () E(θ T(X)). Which to choose? MSS of θ is T = X. T θ N(θ,σ /). E(θ T(X) = t) = θπ(θ t) }{{} dθ. π(θ t) = f(t θ)π(θ) =?? f T (t) f(t θ)π(θ) exp ( (t θ) σ / (θ µ) ) τ exp ( tθ +θ σ θ θµ ) / τ = exp ( σ / τ + tθ σ / + = exp ( θ { [ σ / + t τ ]+( θ)[ θ exp( (θ µ ) σ ) = exp( [θ σ Thus θ (T = t) N(µ,σ ) ad the Bayes estimator σ θ θ µ σ + µ σ ]) = [ σ / + τ ] ad µ t = [ σ / + µ τ ] σ σ = [ σ / + τ ] ad µ = [ t σ / + µ τ ] [ σ / + τ ] ˆθ = E(θ T) = µ = Remark. It is iterestig to otice the followig fact agai. I Example, the Bayes estimator is ˆθ = = X σ / + µ τ σ / + τ σ / σ / + τ X + X σ / + µ τ σ / +. τ τ σ / + µ τ = rx +( r)µ { X if is large or r E(θ) if r 0 a weighted average of the MLE X ad the prior mea µ. 7.3. Methods of evaluatig estimators. Notice that the MME, MLE ad Bayes estimators may ot be the same. Questio: How to compare estimators? ˆθ θ error, Not good, Why? ˆθ θ absolute error, Not good, Why? E(ˆθ) θ bias, deoted by bias(ˆθ) or B(ˆθ); E( ˆθ θ ) mea absolute error, Not ideal, Why? 6 θµ τ ) σ / + µ τ ]}) e ( θ µ σ ) / (mai trick) = e aθ +bθ
E((ˆθ θ) ) mea-squared error of ˆθ; A aive approaches: Select ˆθ that has smaller MSE(ˆθ). Formula: E(ˆθ θ) = Var(ˆθ)+(bias(ˆθ)) Reaso: E((ˆθ θ) ) =E[(ˆθ E(ˆθ)+E(ˆθ) θ) ] =E((ˆθ E(ˆθ)) )+E((E(ˆθ) θ) )+E[(ˆθ E(ˆθ))(E(ˆθ) θ)] =E((ˆθ E(ˆθ)) )+(E(ˆθ) θ) +(E(ˆθ) E(ˆθ))(E(ˆθ) θ) =Var(ˆθ)+(bias(ˆθ)) Defiitio. If bias(ˆθ) = 0, ˆθ is called a ubiased estimator of θ. Example. Suppose X,..., X are i.i.d. with mea µ ad variace σ. A commo estimator of µ is ˆµ = X, ad two commo estimators of σ are ˆσ = i (X i X) ad S = i (X i X). (a) Are they ubiased? (b) Compute the MSE of X, S ad ˆσ uder N(µ,σ ); (c) Compare ˆσ to S uder N(µ,σ ). Sol. (a) Recall: E(X) = µ X, ubiased estimator of µ X. Var(X) = E((X µ) ) = E(X ) µ, ˆσ = i (X i X) = X (X) ad S = ˆσ. E(ˆσ ) =E(X ) E((X) ) =E(X ) ((E(X)) +σ ) X (Why?) =σ σ / Why? E(S ) = E(ˆσ ) = σ = σ. Thus S is ubiased but ot ˆσ. (b) MSE(ˆµ) = Var(X) + (bias(ˆµ)) = σ / + 0. MSE(S ) = Var(S ) + (bias(s )) = Var(S ) MSE(ˆσ ) = ( ) Var(S )+(σ /) () Recall a theorem: Uder i.i.d. ormal assumptio,. X N(µ,σ /);. ( )S σ χ ( ), that is, S σ χ ( ); 3. X S. () Moreover, recall E(χ (m)) = m ad Var(χ (m)) = m. MSE(S ) = ( σ ) ( ) = ( σ4 ). MSE(ˆσ ) = ( ) Var(S )+(σ /) = / σ 4. (c) MSE(ˆσ )/MSE(S ) = ( )( ) <. Thus ˆσ is better i terms of the MSE, (though S is better tha ˆσ ) i terms of ubiasedess. Questio. Is ˆσ is the best i terms of the MSE? MSE(ˆσ ) = / σ 4. Let σ =, the MSE( σ ) = ( σ ). { MSE(ˆσ ) > 0 = MSE( σ ) if σ = MSE(ˆσ ) = 8 0.5 = 8 0.5 6 < = MSE( σ ) if σ = ad = 8 3 Questio: How to compare estimators?. Select ˆθ with smaller MSE(ˆθ),. Select ˆθ with the smallest MSE(ˆθ) (= E((ˆθ θ) )) (impossible)!. 3. Select ˆθ with smaller bias. 4. Select ubiased ˆθ with the smallest Var(ˆθ). Defiitio. A estimator ˆτ is called the best ubiased estimator or uiformly miimum variace ubiased estimator (UMVUE) of τ(θ) if 7
(a) E(ˆτ) = τ(θ) θ Θ; (b) Var(ˆτ) Var( τ) θ Θ ad ubiased τ. I may situatios, the UMVUE exists. Questio: How ca we determie that ˆτ is UMVUE? To aswer the questio, we eed several theorems. Theorem (Cramér-Rao Iequality (CR- Ieq.)) Let X,..., X be i.i.d. from X f(x;θ) ad let W(X) be a statistic. Suppose that { () d dθ E(W) = θw(x)f(x;θ)dx if X is cotiuous x θw(x)f(x;θ) if X is discrete; () Var(W) <. Let τ = E(W). The ( Var(W) d dθ E(W)) (= ( d dθ E(W)) ). Why =? E(( θ lf(x;θ)) ) E(( θ lf(x;θ)) ) The latter is called the Cramér-Rao Lower Boud (CRLB) of ˆτ(θ). Remark. A CR-ieq gives a tool for determiig a UMVUE. If () the assumptios i CR-iequality hold, () E(W) = τ(θ) ad (3) Var(W) = ( d dθ τ(θ)) E( θ lf(x;θ)), the W is a UMVUE of τ(θ). Results: The assumptios i CR-iequality. hold if f(x;θ) belogs to a expoetial family;. ofte fail if the domai of the f depeds o θ such as U(0,θ). Example. Let X,..., X be i.i.d. from N(µ,4), UMVUE of µ? Sol. N(µ,σ ) belogs to the expoetial family. Thus Coditio () i CR-iequality holds. θ =?? τ(θ)=?? Cadidate of a UMVUE of µ: W = X; E(W) = µ = θ (Coditio () i Remark); Var(W) = Var(X) = σ / < (Coditio () i CR-I.); CRLB = ( d dθ τ(θ)) E(( θ lf(x;θ)) ) ; ( d dθ τ(θ)) = ; θ lf(x;θ) = θ [lc (X θ) /σ ] = X θ σ, E(( θ lf(x;θ)) ) = E(( X θ σ ) ) = σ σ = /σ. 4 CRLB = ( d dθ E(W)) = = σ / = Var(X) E( θ lf(x;θ)) σ Thus X is a UMVUE of µ. Oe of 6.8 or 6.9 will be i the midterm. Defiitio. A estimator ˆτ is the best ubiased estimator or UMVUE of τ(θ) if (a) E(ˆτ) = τ(θ) θ Θ; (b) Var(ˆτ) Var( τ) θ Θ ad ubiased τ. The Cramér-Rao Lower Boud (CRLB) gives a tool for determiig a UMVUE. ˆθ is a UMVUE of{ τ(θ) if ˆθ(x) () d dθ E(ˆθ) = θf(x;θ)dx if X is cotiuous x ˆθ(x) θf(x;θ) if X is discrete; () E(ˆθ) = τ(θ), (3) Var(ˆθ) = ( d dθ τ(θ)). E( θ lf(x;θ)) Results: { hold if f(x;θ) belogs to a expoetial family; Assumptios i CR Th ofte fail if the domai of the f depeds o θ e.g. U(0,θ). Example. Let X,..., X be i.i.d. from X f(x;θ) = θ(x (0,θ)). a. MLE ˆθ of θ? b. Fid a ubiased estimator of θ based o ˆθ. 8
c. Show that the CR-iequality fails. d. Why does it fail? Sol.. Solve for MLE: L(θ) = i ;θ) = i=f(x i (X i (0,θ)) θ = θ (X () (0,θ)) Typical way: θ ll(θ) = /θ = 0??? if X () < θ. < 0 if θ > X () Notice that θ ll(θ)? if θ = X (), = 0 if θ < X () Check: Critical poits: 0 X () L(θ) : 0 0 0 thus the L(θ) ց maximum value does ot exist, based o the likelihood! However, the desity fuctio of U(0,θ) is uique i the sese that E( f(x;θ) f (X;θ) ) = 0 if f ad f are two desity fuctios of U(0,θ). Here f(x;θ) = (x (0,θ)) θ ad f (x;θ) = (x [0,θ]) θ. The latter leads to the likelihood L (θ) = θ (X () [0,θ])(X () 0). The the maximum value does exist!! The MLE is ˆθ = X ().. To fid a ubiased estimator, cosider E( θ) = E(cˆθ) = θ. E(ˆθ) = tf X() (t)dt = θ 0 t(t θ ) θ dt (as f X () (t) =! ( )!! (F(t)) f(t)) = θ 0 t θ dt = + θ. A ubiased estimator related to the MLE is θ = +ˆθ. 3. To show the CR-iequality fails, oe eeds to show Var( θ) < CRLB. Now Var( θ) = E(( θ) ) θ. E( θ ) = ( + ) t f X() (t)dt (as E(g(Y)) = = θ 0 ( + ) t ( t θ ) θ dt =( + ) θ θ =( + ) + θ = (+) (+) θ. 0 t + dt tf g(y) (t)dt = g(x)f Y (x)dx) as f X () (t) = (F(t)) f(t) () Var( θ) = (+) (+) θ θ = (+) θ. ( d dθ CRLB = θ) E( θ lf(x;θ)). lf(x;θ) = lθ, x (0,θ). (lf(x;θ)) = θ, x (0,θ). 9
E(( lf(x;θ) θ ) ) = { E( (X (0,θ)) = P(X (0,θ)) θ = θ by accidet! e.g., CRLB = θ )? E( Which is correct?? θ )? = θ θ > θ X U(0,θ) => E( (X (θ,θ)) θ ) = E( θ)?? (+) = Var( θ) Thus the CR-iequality fails. I fact, we shall show θ is UMVUE of θ. d. Reaso that the CRLB fails: (coditio () i theorem fails). θ E(W) θ W(x)f(x;θ)dx where W = θ + = X (), E(W) = θ, LHS= θe(w) =. But RHS=, as RHS = θ = θ 0 =(+) =(+) θ E(W) = (= + wf X () (w)dw + θ w(w θ ) θ dw θ 0 θ 0 =θ + ( )θ =( ) or RHS = }{{} how may? θ (w θ ) dw w ( )θ dw θ W(x)f(x;θ)dx θ W(x)f(x;θ)dx θ yf W(y;θ)dy??) + = θ x ()θ (x (),x () (0,θ))dx =! θ x 0 θ =! =! 0 θ 0 x x dx dx 0 0 }{{} (by iductio o ) {}}{ x (x ) ( )! (x ) ( )! dx ( )θ + dx { θ θ + } Why?? dx {( )θ + } =!θ + (+)( )! ( )θ + = θ E(W) θw(x)f(x;θ)dx (coditio () i theorem fails). Theorem. If () T is a sufficiet ad complete statistic for θ; () φ(t) is a statistic that oly depeds o T, The φ(t) is the uique UMVUE of E(φ(T)). 0
Corollary. θ = + X () is UMVUE of θ if X i s are i.i.d. U(0,θ). Why? Remark. more ways for fidig a UMVUE of τ(θ) based o Theorem :. Fid a sufficiet ad complete statistic T ad a φ(t) that is ubiased of τ(θ), the φ(t) is the UMVUE of τ(θ). 3. Fid a sufficiet ad complete statistic T ad a ubiased estimator W of τ(θ), the ˆτ = E(W T) is the UMVUE of τ(θ). Example. Let X i s be i.i.d. from N(µ,σ ). UMVUE of µ ad σ? Sol. Use Method. T = (X,X ) is sufficiet ad complete (kow due to expoetial family). A fuctio φ(t) such that E(φ(T)) = θ? E(S ) = σ ad S = (X (X) ), a fuctio of T. E(X ) = E(X ) = µ +σ ad X is a fuctio of T; E(X S ) = µ +σ σ = µ ad X S is a fuctio of T; Thus X S ad S are the UMVUEs of µ ad σ, respectively. Example. Let X,..., X be i.i.d. from Poisso(λ). UMVUE of λ? Sol. Recall E(X ) = λ = Var(X ) for Poisso(λ). T = i= X i is sufficiet ad complete. Two ubiased estimators: ˆλ = X, ˇλ = S. Method. Check: Cramer-Rao Lower Boud = V(ˆλ) or V(ˇλ)? Method. ˆλ = X, as E(X) = µ = λ. Method 3. λ = E(W T), where W = S or X. Questio: () Which method is better here? () E(X T) = ˆλ? (3) E(S T) = ˆλ? Cosider the case =. Let T = i X i. E(S X) = E(S T/) = E(S T) = [E(X X) (X) ]. f X X +X (x t) = P(X = x,x = t x)/p(t = t) = ( t x) 0.5 x 0.5 t x (bi(t,0.5)). E(X T) = E(X T) = (Tpq +(Tp) ) = (T/4 T /4) = (X/) (X). E(S X) = E(S T) = [E(X X) (X) ] = X. I geeral,. X i i X i bi(t,/), i =,...,.
Example 3. Let X,..., X be a radom sample from X bi(5,θ). τ = P(X ). UMVUE of τ? Sol. τ =? (= P(X )). τ = ( θ) 5 +5θ( θ) 4. 3 methods for UNVUE:. Fid a ubiased ˆτ, compare σˆτ to CRLB.. Fid a complete sufficiet T ad g(t) so that E(g(T)) = τ. 3. Fid a complete sufficiet T ad a ubiased ˆτ, compute E(ˆτ T). Method 3. ˆτ = E(W T). W=? T=? E(W T)=? W = (X ). The E(W) = P(X ) = P(X ). Why ot W = (X )? T = i= X i is sufficiet ad complete (due to the expoetial family). Why T, ot X? Either is fie, but T bi(5,θ), T = X, f X (y) =? E(W T) =? As.: () g(t) = E(W T = t), t = 0,..., 5. () E(W T) = g(t). E(W T = t) = wdf W T (w t) meaig? = 0 f W T (0 t)+ f W T ( t) E(W T = t) = f W T ( t). f W T ( t) = P(W=,T=t) P(T=t), t {0,,...,5}. P(T = t) =? P(W =,T = t) =? If t = 0, the P(W =,T = t) = P(X {0,},T = 0) = {T = 0} = { i= X i = 0} = {X = = X = 0}. E(W T = 0) = f W T ( 0) = P(T=0) P(T=0). If t, { P(X = 0)? P(T = 0)? P(W =,T = t) =P(X {0,},T = t) how to proceed? (=P(X {0,})P(T = t)?) =P(X {0,}, i= X i = t) i= =P(X = 0, X i = t)+p(x =, X i = t) i= i= =P(X = 0, X i = t)+p(x =, X i = t ) i= i= =P(X = 0)P( X i = t)+p(x = )P( X i = t ) ( ) 5( ) =( θ) 5 θ t ( θ) 5( ) t t ( 5( ) +5( θ) 4 θ t ( ) ( 5( ) 5( ) =[ +5 t t Sice P(T = t) = ( ) 5 t θ t ( θ) 5 t. { if T = 0 ˆτ = [ ( ) ( 5( ) T +5 5( ) ) ( T ]/ 5 ) T if T, i= ) θ t ( θ) 5( ) t+ ) ]θ t ( θ) 5 t where T = X i. i=
Theorem. If () T is a sufficiet ad complete statistic for θ; () φ(t) is a statistic that oly depeds o T, The φ(t) is the uique UMVUE of E(φ(T)). Theorem 3 (Rao-Blackwell). Suppose that () W is a ubiased estimator of τ(θ), () T is sufficiet for θ ad (3) ˆτ = E(W T). The Var(ˆτ) Var(W) ad E(ˆτ) = τ(θ). What is the differece betwee Th ad 3? Remark. The R-B Theorem does ot say that ˆτ is the UMVUE. Proof of R-B Th. E(ˆτ) = E(E(W T)) = E(W) = τ(θ). Var(W) = Var(E(W T))+E(Var(W T)) Thus Var(W) Var(E(W T)) = Var(ˆτ). Proof of Theorem. Step () Claim: φ(t) is a UMVUE of τ(θ) = E(φ(T)). If φ(t) is ot a UMVUE of τ(θ), the there exists a ubiased estimator W such that We shall show that it leads to a cotradictio. Now ˆτ = E(W T) is a ubiased estimator ad Var(W) < Var(φ(T)) for a θ = θ o (or for all θ?). Var(ˆτ) Var(W) < Var(φ(T)) for θ = θ o by R-B theorem. () Let g(t) = E(W T) φ(t), the E(g(T)) = τ(θ) τ(θ) = 0 θ. It follows that P(g(T) = 0) = θ, Why? that is, φ(t) = E(W T) w.p., a cotradictio to Iequality () Why?? The cotradictio implies that φ(t) is a UMVUE of τ. Step () : Cov(W,φ(T)) = σ W σ φ(t) where W is a arbitrary UMVUE of τ(θ). W = (W +φ(t))/ is also ubiased, ad Var(W) Var(W ) (as W is a UMVUE) = Var(φ(T))+ Var(W)+ Cov(φ(T),W) Var(W)+ Var(W)+ Var(W) Why?? =Var(W) (Cov(X,Y) σ X σ Y ) Cov(φ(T),W) = Var(W) = Var(φ(T))Var(W) Why? Step (3) Claim: φ(t) is the uique (w.p.) UMVUE of τ(θ). Recall that Cov(X,Y) σ X σ Y with equality iff P(Y = a+bx) = for some costats a ad b. Let W be a arbitrary UMVUE of τ(θ). Thus Step () => P(φ(T) = a+bw) = for some costats a ad b. The E(φ(T)) = a+be(w) ad thus τ(θ) = a+bτ(θ) θ. It follows that a = 0 ad b =, ad thus P(W = φ(t)) =. Theorem (Cramér-Rao Iequality (CR- Ieq.)) Let X,..., X be i.i.d. from X f(x;θ) ad let W(X) be a statistic. Suppose that { () d dθ E(W) = θw(x)f(x;θ)dx if X is cotiuous W(x)f(x;θ) if X is discrete; x θ 3
() Var(W) <. Let τ = E(W). The ( Var(W) d dθ E(W)) (= ( d dθ E(W)) ). Why =? E(( θ lf(x;θ)) ) E(( θ lf(x;θ)) ) Remark. I geeral, the CRLB = (τ (θ)) I, where I (θ) (θ) = E(( θ lf(x;θ)) ), I (θ) is called the Fisher iformatio umber. Here X = (X,...,X ) ad X,..., X do ot eed to be i.i.d.. If they are, the I (θ) = I (θ), where I (θ) = E(( θ lf(x i;θ)) ); Moreover, { if θ E( θ lf(x ;θ)) = θ ( θ lf(x;θ)f(x;θ))dx if X is cotiuous θ ( θ lf(x;θ)f(x;θ)) if X is discrete, the I (θ) = E( θ ( lf(x;θ))) = E( θ θ lf(x ;θ)) () Proof of () uder the assumptio that X is cotiuous. Let Y i = θ lf(x i;θ), the E(Y i ) =E( θ lf(x i;θ)) θ =E( f(x i;θ) f(x i ;θ) ) θ = f(x;θ) f(x;θ) f(x;θ)dx = θ f(x;θ)dx = f(x;θ)dx θ =0. (by () i the theorem) E(Y i ) = Var(Y i) = V(Y i ). E( i Y i) = 0. I (θ) = E(( lf(x i ;θ)) ) = E[( θ i i Y i ) ] = V( i Y i ) = i V(Y i ) = I (θ) 0 =E(Y i ) = ( lf(x;θ))f(x;θ)dx => θ θ 0 = ( θ θ lf(x;θ))f(x;θ)dx [ = ( θ θ lf(x;θ))f(x;θ)] dx = [ θ ( θ lf(x;θ))]f(x;θ)+( θ lf(x;θ)) θ f(x;θ)dx = [ θ lf(x;θ))]f(x;θ)dx+ ( θ lf(x;θ)) θ f(x;θ)dx = [ θ lf(x;θ))]f(x;θ)dx+ ( θ lf(x;θ))( θ lf(x;θ))f(x;θ)dx =E( θ lf(x i;θ))+e(( θ lf(x i;θ)) ) (by assumptio) Thus E( θ lf(x i ;θ)) = E(( θ lf(x i;θ)) ). 7.3.4. More about the Bayes estimator. Iterpretatio of various estimatio methods: MLE ˆθ maximizes L(θ) = f X (x;θ), maximizig the chace for give 4
X = x. MME θ solves θ through E θ (X k ) = X k. Ubiased estimator ˇθ set E(ˇθ)) = θ., UMVUE θ is the best ubiased estimator i terms of variace. Why Bayes estimator E(θ X)? Defiitios: A decisio problem cosists of X sample space, A actio space, Θ parameter space, L(θ,a) loss fuctio, that is, L: Θ A R. A decisio rule δ is a (measurable) fuctio from X to A, that is, δ: X A. R(θ,δ) = E(L(θ,δ(X))) risk fuctio of δ, or more precisely, R(θ,δ) = E(L(θ,δ(X)) θ) (fuctio of (θ,δ), ot X)). r(π,δ) = E π (R(θ,δ)) Bayes risk of δ. It is ot a fuctio of (X,θ)!! δ B = argif δ r(π,δ) is called the Bayes rule of θ w.r.t. prior π ad loss L. Remark. If L = (θ a) (called the quadratic loss fuctio or the squared error loss), the E π (θ X) is the Bayes rule w.r.t. π ad L (or Bayes estimator). The Bayes estimator is the best i term of E(E(ˆθ(X) θ) θ), average error over (X,θ). Example. Let X bi(,θ), π(θ) beta(α,β) where α = β = /. The the MLE is ˆθ = X/, ad the Bayes estimator uder the square error loss is θ = E(θ X) = X+α +α+β = X+ / + why? Ca we write θ = E(θ x) = x+ / +? Ca we write θ = E(θ X) = x+ / +? This is a estimatio problem ad is also called a decisio problem. I this decisio problem, X = {0,,...,} (set of possible observatios) Θ = [0,] = A (set of possible estimates) L = (a θ) (error). A decisio rule δ is a estimator. ˆθ ad θ are both decisio rules. The R(θ,ˆθ) = E((ˆθ θ) ) = E(( X θ) ) = σx/ θ( θ) = V(X/) =. r(π,ˆθ) = E( θ( θ) ) = B(α+,β+) B(α,β) Why?? Recall Γ(α+) = αγ(α) ad B(α,β) = Γ(α)Γ(β) r(π,ˆθ) = α α+β+ β α+β = 4 ( +) Γ(α+β). R(θ, θ) = E(( θ θ) ) = MSE( θ) = Var( X + / + R(θ, θ) = θ( θ) (+ +( θ+ / ) + θ) = r(π, θ) = E(R(θ, θ)) = 4(+ ) } {{ } θ )+(bias( θ)) 4(+ = ) 4( < r(π,ˆθ). +) It ca be checked that r(π, θ) = if δ r(π,δ) (see Remark later). Thus θ miimizes the average error. w.r.t. L ad π. Example. Other loss fuctios: L(θ,a) = a θ, def = L(θ,a) = (a θ) θ( θ), where Θ = [0,], ad 0 Is E(θ X) still the Bayes rule w.r.t. L ad π? Example 3. Suppose that X bi(,p), π(p) beta(α,β), with α = β = /, ad L(p,a) = (a p) p( p). Let ˆp = X/, ad ˆp = X+α +α+β. R(p, ˆp i) =? r(π, ˆp i ) =? Sol. R(p, ˆp ) = E( (X/ p) p( p) ) = p( p)/ p( p) = /. r(π, ˆp ) = E(/) = /. R(p, ˆp ) = E( ( X+α +α+β p) p( p) X +α ) = E(( +α+β p) ) p( p) = 5 4(+ ) p( p)
) r(π, ˆp ) = ce( ) = cb(α,β Why? p( p) B(α, β) (α+β )(α+β ) = c = (α )(β ) 4(+ ) ( = ( ) ( +) ( ) )( ) ( { ) > / = r(π, ˆp ) if = 4 or 9, < / = r(π, ˆp ) if = 00 (by Ex. ). Ca we tell whether ˆp or ˆp is Bayes rule (w.r..t. π ad L)? Remark. Uder certai regularity coditios (i the Fubii Theorem), () If E(L(θ,δ) X) is fiite, the the Bayes rule is δ B (x) = argmi a E(L(θ,a) X = x). Or, if T is sufficiet ad E(L(θ,a) T) is fiite, the the Bayes rule is δ B (t) = argmi a E(L(θ,a) T = t). () If L = (a θ), the δ B = E(θ X). Proof: Note that both X ad θ are radom. r(π, δ) =E(E(L(θ, δ(x)) θ)) =E(E(L(θ, δ(x)) X)) (by Fubii Theorem) is miimized by miimizig E(L(θ,δ(X)) X = x) for each x or miimizig E(L(θ,a) X = x) over all a A for each x. Why? () If L = (a θ), the E(L(θ,a) X = x) = E((a θ) X = x) If E((a θ) X = x) is fiite, the ae(l(θ,a) X = x) = E((a θ) X = x) (Why?? Is it right?) = a [a ae(θ X = x)+e(θ X = x)] = a E(θ X = x). a E(L(θ,a) X = x) = > 0. Thus a = E(θ X = x) is the miimum poit. That is, δ B = E(θ X) is the Bayes estimator w.r.t. L ad π. Remark. Hereafter, if we do ot metio L i the problem, the Bayes estimator is E(θ X), otherwise, the Bayes estimator is the Bayes rule w.r.t. the loss L ad the prior π. Remark. Uder certai regularity coditios (i the Fubii Theorem), If E(L(θ,δ) X) is fiite, the the Bayes rule is δ B (x) = argmi a E(L(θ,a) X = x). Or, if T is sufficiet ad E(L(θ,a) T) is fiite, the the Bayes rule is δ B (t) = argmi a E(L(θ,a) T = t). Example 4. If oe observes X, where X bi(,p), L = (a p) p( p), π(p) U(0,), the Bayes estimator ˆp=? Sol. ˆp = δ B (x) = argmi a E(L(p,a) X = x). }{{} =? L(p,a)π(p x) }{{} =? dp f(x,p) f X(x) =? The joit distributio of (X,p) is f(x,p) =f(x p)π(p) ( ) = p x ( p) x (p [0,]) x π(p x)? p x ( p) x (p (0,)). 6
Thus π(p x) beta(x+, x+). g(a) = E(L(p,a) X = x) = g(a) = c 0 Step () g (a) = c 0 (a p)px ( p) x dp. g (a) = 0 a = = 0 (a p) p( p) cpx ( p) x dp (a p) p x ( p) x dp 0 ppx ( p) x dp 0 px ( p) x dp 0 0 pp x ( p) x B(x, x) dp p x ( p) x B(x, x) dp = mea of a beta distributio = Step () g (a) = c 0 px ( p) x dp > 0. => a = ˆp(x) = x/ is the Bayes estimator of p. Are we doe??? g(a) =E(L(p,a) X = x) =c 0 (a ap+p )p x ( p) x dp x x+( x) a B(x, x) ab(x+, x)+b(x+, x).. Notice that if x 0 or, g(a) is fiite for all a [0,]. g (a) = c[ab(x, x) B(x+, x)] = 0 g (a) = cb(x, x) > 0. Thus g(a) is miimized by a = δ B (x) = B(α,β) < iff α > 0 ad β > 0. () 0 px ( p) x dp = B(x+, x) 0 px ( p) x dp B(x, x) = Γ(x+)Γ( x)γ(). Notice that if x = 0, g(a) is fiite oly whe a = 0, as g(0) = c 0 Γ(x)Γ( x)γ(+) = x/. p x+ ( p) x dp = cb(,) Otherwise, (if uaware of ()) g(a) = c{ 0 a p ( p) dp ab(,)+b(,)} c / a p (0.5) dp+c{ ab(,)+b(,)} = lim 0 y 0 ca (0.5) (lp 0.5 ) =. y Thus g(a) is miimized by a = δ B (0) = 0 = 0/. Ca we say that ab(x, x) = 0 if x = 0 = a? 3. Notice that if x =, g(a) is fiite oly whe a =, by symmetry. Thus g(a) is miimized by a = δ B () = = /. Aswer: The Bayes estimator w.r.t. π ad L is δ B (X) = X/. Questio about y b(y) a(y) b(y) g(x,y)dx = g(b(y),y)b (y) g(a(y),y)a (y)+ a(y) y g(x,y)dx 7
y si(xy)(x < y)dx =? 0 8.. Two types of ifereces: { Chapter 8. Hypothesis Testig. Estimatio problem: θ =?. Testig Problem: θ = θ o? Here θ o is give. Example. A slot machie is claimed to have wiig rate 40%. To test the claim, 5 rus are made. Observe X times of wiig. Let p be the wiig rate of the machie. Possible Questios: H 0 : ull hypothesis H : alterative hypothesis made by p = 40%? p 40%? maufacturer p > 40%? casio ower p < 40%? player If H 0 is correct, the X bi(5,/5) ad oe expects wiigs. The maker rejects H 0 if X = 0,4,5 but or,,3. The ower rejects H 0 if X = 4,5 but ot 0,,,3 A player rejects H 0 if X = 0, but ot,,3,4,5. rejectio regio (RR) A test statistic or test fuctio is φ = (X RR), which has two iterpretatios:. If X RR, the φ = or H is accepted (ofte say rejectig H 0 ); if X / RR, the φ = 0 or H 0 is accepted (ofte say ot rejectig H 0 ).. The probability of rejectig H 0 is { if X RR 0 otherwise. A testig hypothesis for θ Θ cosists of 5 elemets:. H 0 : θ Θ o (Θ o = {0.4} i Example ).. H : θ Θ c o = Θ\Θ o (i Example ) H Θ Θ c o θ 0.4 [0,] [0,0.4) (0.4,] θ > 0.4 [0.4,] (0.4,] θ < 0.4 [0,0.4] [0,0.4) 3. Test statistic φ (= (X RR) i Example ). 4. α size of the test defied by α = sup θ Θo E θ (φ). 5. Coclusio: Reject or do ot reject H 0. Two types of errors:. Type I error: reject correct H 0, deoted by H H 0.. Type II error: do ot reject wrog H 0, deoted by H 0 H. Defiitio. β(θ) power fuctio of the test defied by β(θ) = E θ (φ). For θ Θ c o, β(θ) is called the power (at θ) of the test. If θ Θ o the β(θ) = P(H H 0 ), the probability of type I error; If Θ o = {θ o } the β(θ o ) = α, the size of the test; If θ Θ c o, the β(θ) = P θ (H 0 H ), where P θ (H 0 H ) is the probability of type II error. Example (cotiued). Compute β(p) ad α. β(p) = E((X RR)) = x RR( 5 x) p x ( p) 5 x. α = P p (X RR) whe p = 0.4. R x=0:5 roud(dbiom(x,5,0.4),3) [] 0.078 0.59 0.346 0.30 0.077 0.00. H : p < 0.4. α = (P(X {0})) = 0.078.. H : p > 0.4. α = (P(X {4,5})) = 0.077+0.00 = 0.087. 3. H : p 0.4. α = (P(X {0,4,5})) = 0.078+0.077+0.00 = 0.65. 8
8.. Questio: How to costruct a test? As. Method. Likelihood ratio test (LRT): Let X,..., X be a radom sample from f(x;θ). For testig H 0 : θ Θ o v.s. H : θ Θ c o, LRT φ = (λ c), where λ = sup θ Θ o L(θ x) sup θ Θ L(θ x) = L(ˆθ o x) L(ˆθ x) L(θ x) = i= f(x i;θ), ˆθ is the MLE of θ uder Θ, ˆθ o is the MLE of θ uder Θ o, c is determied by α = sup θ Θo P(λ c), or otherwise, c = sup{t : α sup θ Θo P(λ t)}. Q: How to uderstad λ? Two extremes? Is λ = (or λ >> c) likely uder H 0 or H? Is λ = 0 (or λ << c) likely uder H 0 or H? Example. A radom sample from N(µ,) results i X =., where = 00. Do you believe µ =? Sol. Use LRT. H 0 : µ = v.s. H : µ. α = 0.05. L(µ x) = cexp( i= (X i µ) ), Θ o = {}: MLE ˆµ 0 = (= µ 0 ); Θ = (, ): MLE ˆµ = X; λ = L(ˆµ 0 x) L(ˆµ x) = cexp( i= (X i µ 0 ) ) cexp( i= (X i X) ) =exp( [(X i µ 0 ) (X i X) ]) =exp( i= [(X i X µ 0 )(X µ 0 )]) i= =exp( [(X X µ 0)(X µ 0 )]) =exp( (X µ 0)(X µ 0 )]) =exp( (X µ 0) ). φ = (λ c) = ( X µ 0 c ). Sice α = E µ0 (φ) = 0.05, X N(µ 0,/), X / N(0,), P( X / >.96) 0.05, c =.96/. Or c = exp( c ). (It is importat to fid c ad c). That is φ = ( X.96/ ). Thus do ot reject H o. It is likely that µ =. Where are the 5 elemets of a test? A testig hypothesis for θ Θ cosists of 5 elemets:. H 0. H 3. Test statistic φ (= (λ c) for LRT) 9
4. α size of the test defied by α = sup θ Θo E θ (φ). 5. Coclusio: Reject or do ot reject H 0 ad aswer to the related questio. i.i.d. Example 3. Suppose that X,..., X 4 N(µ,σ ), θ = (µ,σ ) are ukow, X = 3 ad S = 4. H 0 : µ 0 (= µ 0 ) v.s. H : µ > µ 0. LRT? Sol. Remark. A atural estimator of µ =? If X = 00, H 0 or H? If X = 0.00, H 0 or H? If X = 3, H 0 or H? eed to fid out ow). A atural test is φ + = (ˆµ > b) Why? 5 elemets of a test: ()? ()? (3) Choose size α = 0.05. (4) Test statistics: (λ c) =? λ = L(ˆθ o x) L(ˆθ x) (5) Coclusio. Do t forget! L = ( πσ ) exp( i (X i µ) /σ ) =? c=? The mai work!! Θ o = {(µ,σ ) : µ µ 0,σ > 0}. Θ = {(µ,σ ) : µ R,σ > 0}. MLE uder Θ: ˆθ = (X,ˆσ ), where ˆσ = i= (X i X). MLE uder Θ o : ˆθ o = (ˆµ 0,ˆσ 0) = (X µ 0, i= (X i X µ 0 ) ) (see Example 3 i MLE sectio), or the derivatio as follows. If X µ 0, the ˆθ Θ o ad thus it is the maximum poit of the likelihood L(θ X). If X > µ 0, the, sice ˆθ is the uique statioary poit i Θ, the maximum poit of L(θ X) must be o the boudary: boudaries : µ = µ = µ 0 σ = 0 σ = L(θ X) : 0 f iite 0 0 It is easy to show that o the boudary µ = µ 0, the maximum poit of the likelihood is achieved at ˆθ o = (ˆµ 0,ˆσ 0) = (µ 0, i= (X i µ 0 ) ). L(ˆθ) = ( πˆσ ) exp( i (X i ˆµ) /ˆσ ) = ( πˆσ ) exp( ). L(ˆθ o ) = ( ) exp( πˆσ i (X i ˆµ 0 ) /ˆσ 0) { 0 ( πˆσ = ) exp( ) if X µ 0 ( ) exp( πˆσ ) if X > µ 0. 0 if X µ 0 λ = i ( i (Xi X) if X > µ (Xi µo))/ 0. φ =(λ c) c 0? or c? i =( (X i X) i (X i µ o ) c/ ) Why? Is it correct? i φ + =( (X i X) i (X i µ o ) c/ )(X > µ o ) Is it correct? i φ =( (X i X) i (X i X) + i (X µ o) c/ ) (as =( i (X i X) + i (X µ o) i (X i X) /c / ) (X i µ o ) = (X i X +X µ o ) i i = i (X i X) + i (X µ o) ) 30
i =( (X µ o) i (X i X) /c/ ) =( (X µ o) i (X i X) c ) c =? =( X µ o S / c 3) c 3 =? Recall that X µ t if X i s i.i.d. N(µ,σ ). S / α = sup µ µ0 E(φ) = sup µ µ0 P( X µo c S 3) / = sup µ µ0 P( X µ+(µ µo) c 3 ). S / Sice t desity fuctio is bell-shaped ad symmetric about 0, with µ 0 = 0 here. That is, Q: Two tests: α = sup E(φ) = sup P( X µ+(µ µ o) c 3 ) = P( X µ o µ µ 0 µ µ 0 S / S / c 3) φ = ( X µ o S / t 0.05, ). φ + = ( X µo t S 0.05, ) => (3>.353) (reject H 0 ), / φ = ( X µo t S 0.05, ) => ( 3 >3.8) (do t reject H 0 ) / Which makes more sese? Recall H 0 : µ 0 (= µ 0 ) v.s. H : µ > µ 0. Questio: Somethig goes wrog? ( c) if X < 0 (λ c) = i (( i (Xi X) c) if X > µ (Xi µo))/ 0 =( X µ o S / c 3)(X > µ 0 ) =( X µ o S / c 3) = φ + (4) Test statistic is φ +. (5) Reject H 0. The data does ot support the claim that µ 0. Example. A radom sample X,..., X 4 from f = exp( (x θ)), x θ. H 0 : θ (= θ o ) v.s. H : θ >. LRT of size α = 0.0 if X () =.? Sol. (λ c) =? λ = L(ˆθ o x) =? c=? The mai task!! L(ˆθ x) What will you do if ˆθ = 0.? ˆθ = 00? Remark: A atural test is (ˆθ > b) Why? Step () MLE uder Θ = (, ) : MLE =? L = L = exp( (X i θ)) = exp( X i +θ) i θ... i= [exp( (X i θ))(x i θ)] i= 3 i=