Parameter Estimation Fitting Probability Distributions Bayesian Approach

Parameter Estimatio Fittig Probability Distributios Bayesia Approach MIT 18.443 Dr. Kempthore Sprig 2015 1 MIT 18.443 Parameter EstimatioFittig Probability DistributiosBayesia Ap

Outlie Bayesia Approach to Parameter Estimatio 1 Bayesia Approach to Parameter Estimatio 2 MIT 18.443 Parameter EstimatioFittig Probability DistributiosBayesia Ap

Bayesia Framework: Extesio of Maximum Likelihood Geeral Model Data Model : X = (X 1, X 2,..., X ) vector-valued radom variable with joit desity give by f (x 1,..., x θ) Data Realizatio: X = x = (x 1,..., x ) Likelihood of θ (give x): lik(θ) = f (x 1,..., x θ) (MLE θˆ maximizes lik(θ) for fixed realizatio) Prior distributio: true θ Θ modeled as radom variable θ Π, with desity π(θ), θ Θ Posterior Distributio: Distributio of θ give X = x Joit desity of (X, θ): f X,θ (x, θ) = f (x θ)π(θ) Desity of margial X distributio of X : f X (x) = f X,θ (x, θ)dθ = f (x θ)π(θ)dθ Θ Θ Desity of posterior distributio of θ give X = x f X,θ (x, θ) π(θ x) = f X (x) 3 MIT 18.443 Parameter EstimatioFittig Probability DistributiosBayesia Ap

Bayesia Framework Posterior Distributio: Coditioal distributio of θ give X = x f X,θ (x, θ) f (x θ)π(θ) π(θ x) = = X f X (x) f (x θ)π(θ)dθ Posterior desity Bayesia Priciples Θ = f (x θ)π(θ) = Likelihood(θ) Prior desity Prior distributio models ucertaity about θ, a priori (before observig ay data) Justified by axioms of statistical decisio theory (utility theory ad the optimality of maximizig expected utility). All iformatio about θ is cotaied i π(θ x) Posterior mea miimizes expected squared error E [(θ a) 2 x] miimized by a = E [θ x]. Posterior media miimizes expected absolute error E [ θ a x] miimized by a = media(θ x). 4 MIT 18.443 Parameter EstimatioFittig Probability DistributiosBayesia Ap

Bayesia Framework Bayesia Priciples (cotiued): Posterior Mode: Modal value of π(θ x) is most probable. Aalogue to 90% cofidece iterval: θ values betwee 0.05 ad 0.95 quatiles of π(θ x). Highest posterior desity (HPD) iterval (regio): For α : 0 < α < 1, the (1 α)hpd regio for θ is R d = {θ : π(θ x) > d } where d is the value such that π(r d x) = 1 α. Note: if posterior desity is uimodal but ot symmetric, the the tail probabilities outside the regio will be uequal. 5 MIT 18.443 Parameter EstimatioFittig Probability DistributiosBayesia Ap

Bayesia Iferece: Beroulli Trials Beroulli Trials: X 1, X 2,..., X i.i.d. Beroulli(θ) Sample Space: X = {1, 0} ( success or failure ) Probability mass fuctio θ, if x = 1 f (x θ) = (1 θ), if x = 0 Examples: Flippig a coi ad observig a Head versus a Tail. Radom sample from a populatio ad measurig a dichotomous attribute (e.g., preferece for a give political cadidate, testig positive for a give disease). Summary Statistic: S = X 1 + X 2 + + X S Biomial(, p ) P(S = k θ) = θ k (1 θ) k, k = 0, 1,...,. k 6 MIT 18.443 Parameter EstimatioFittig Probability DistributiosBayesia Ap

Bayesia Iferece: Beroulli Trials Case 1: Uiform Prior for θ Θ = {θ : 0 θ 1} = [0, 1] Prior desity for θ: π(θ) = 1, 0 θ 1 Joit desity/pmf for (S, θ) f S,θ (s, θ) = p f S θ (s ) θ)π(θ) = θ s (1 θ) ( s) 1 s Margial desity of S p ) X 1 f S (s) = 0 θ s (1 θ) ( s) dθ p ) s X 1 = 0 θ s (1 θ) ( s) dθ p s ) 1 = Beta(s + 1, ( s) + 1) s +1 Posterior desity of θ give S π(θ s) = f S,θ (s, θ)/f S (s) 7 MIT 18.443 Parameter EstimatioFittig Probability DistributiosBayesia Ap

Bayesia Iferece: Beroulli Trials Case 1: Uiform Prior (cotiued) Posterior desity of θ give S π(θ s) = f S,θ (s, θ)/f S (s) θ s (1 θ) ( s) = Beta(s + 1, ( s) + 1) Recall a radom variable U Beta(a, b), has desity a 1 (1 u) g(u a, b) = u b 1 Beta(a,b), 0 < u < 1 where Γ(a)Γ(b) Beta(a, b) =, with Γ(a+b) X α 1 0 Γ(a) = y e x dx, (see Gamma(a) desity) Γ(a + 1) = a Γ(a) = (a!) for itegral a Also (Appedix A3 of Rice, 2007) E [U a, b] = a/(a + b) Var[U a, b] = ab/[(a + b) 2 (a + b + 1)] 8 MIT 18.443 Parameter EstimatioFittig Probability DistributiosBayesia Ap

Bayesia Iferece: Beroulli Trials Case 1: Uiform Prior (cotiued) Prior: θ Beta(a = 1, b = 1), priori a Sample data: = 20 ad S = i=1 X i = 13 (Example 3.5.E) Posterior: [θ S = s] Beta(a, b) with a = s + 1 = 14 ad b = ( s) + 1 = 8 Use R to compute: Posterior mea: a/(a + b) Posterior stadard deviatio: ab/[(a + b) 2 (a + b + 1)] Posterior probability:π({θ.5} s) > a=14; b=8 > a/(a+b) [1] 0.6363636 > sqrt(a*b/(((a+b)**2)*(a+b +1))) [1] 0.100305 > pbeta(.5,shape1=14, shape2=8) [1] 0.09462357 9 MIT 18.443 Parameter EstimatioFittig Probability DistributiosBayesia Ap

Bayesia Iferece: Beroulli Trials Case 2: Beta Prior for θ Θ = {θ : 0 θ 1} = [0, 1] Prior desity for θ: θ a 1 (1 θ) π(θ) = b 1 Beta(a,b), 0 θ 1 Joit desity/pmf for (S, θ) f S,θ (s, θ) = f p S θ (s ) θ)π(θ) θ a 1 (1 θ) = θ s (1 θ) ( s) b 1 s Beta(a,b) θ s+a 1 (1 θ) ( s)+b 1 Posterior desity of θ give S π(θ s) = f S,θ (s, θ)/f S (s) θ s+a 1 (1 θ) ( s)+b 1 = X θ (θ / ) s+a 1 (1 θ / ) ( s)+b 1 dθ /, θ s+a 1 (1 θ) ( s)+b 1 = Beta((s + a 1, ( s) + b 1) 10 MIT 18.443 Parameter EstimatioFittig Probability DistributiosBayesia Ap

Bayesia Iferece: Beroulli Trials Case 2: Beta Prior (cotiued) Note: Posterior desity of θ give S π(θ s) = f S,θ (s, θ)/f S (s) θ s+a 1 (1 θ) ( s)+b 1 = X θ (θ / ) s+a 1 (1 θ / ) ( s)+b 1 dθ /, θ s+a 1 (1 θ) ( s)+b 1 = Beta((s + a 1, ( s) + b 1) This is a Beta(a, b ) distributio with a = s + a ad b = ( s) + b. A prior distributio Beta(a, b) correspods to a prior belief cosistet with hypothetical prior data cosistig of a successes ad b failures, ad uiform pre-hypothetical prior. 11 MIT 18.443 Parameter EstimatioFittig Probability DistributiosBayesia Ap

Bayesia Iferece: Normal Sample Normal Sample X 1, X 2,..., X i.i.d. N(µ, σ 2 ). Sample Space: X = (, + ) (for each X i ) Probability desity fuctio: f (x µ, σ 2 ) = 2πσ 2 Cosider re-parametrizatio: Three Cases: 1 (x µ) 1 2 2 σ 2 ξ = 1/σ 2 (the precisio) ad θ = µ. 1 1 ξ(x θ) f (x θ, ξ) = ( ξ ) 2 e 2 2 2π Ukow θ (ξ = ξ 0, kow) Ukow ξ (θ = θ 0, kow) Both θ ad ξ ukow e 12 MIT 18.443 Parameter EstimatioFittig Probability DistributiosBayesia Ap

Bayesia Iferece: Normal Sample Case 1: Ukow mea θ ad kow precisio ξ 0 Likelihood of sample x = (x 1,..., x ) lik(θ) = f ( (x 1,..., x θ, ξ 0 ) = f (x i θ, ξ 0 ) ( i=1 1 = ( ξ 0 ) 2 e 2 1 ξ 0 (x i θ) 2 i=1 2π 1 2π = ( ξ 0 ) 2 e 2 ξ 0 i=1 (x i θ) 2 Prior distributio: θ N(θ 0, ξ 1 ) prior ξ prior 1 1 ξprior (θ θ 0 ) 2 π(θ) = ( 2π ) 2 e 2 Posterior distributio π(θ x) lik(θ) π(θ) 1 1 = ( ξ 0 ) 1 2 e 2 ξ 0 i=1 (x i θ) 2 ( ξ prior ) 2 e 2 ξ prior (θ θ 0 ) 2 2π 2π 1 [ξ 0 i=1 (x i θ) 2 +ξ prior (θ θ 0 ) 2 ] e 2 1 [ξ 0 (θ x) 2 +ξ prior (θ θ 0 ) 2 ] e 2 (all costat factor terms dropped) 13 MIT 18.443 Parameter EstimatioFittig Probability DistributiosBayesia Ap

Bayesia Iferece: Normal Sample Case 1: Ukow mea θ ad kow precisio ξ 0 Claim: posterior distributio is Normal(!) Proof: π(θ x) lik(θ) π(θ) 1 [ξ 0 (θ x) 2 +ξ prior (θ θ 0 ) 2 ] e 2 1 Q(θ) e 2 where Q(θ) = ξ post (θ θ post ) 2 with ξ post = ξ prior + ξ 0 (ξ prior )θ 0 +(ξ 0 )x θ post = (ξ prior )+(ξ 0 ) = αθ 0 + (1 α)x, where α = ξ prior /ξ post By examiatio: θ x N(θ post, ξpost 1 ) Note: As ξ prior 0, θ post x = θˆmle ξ (σ 2 σ 2 post ξ 0 0 /) 14 MIT 18.443 Parameter EstimatioFittig Probability DistributiosBayesia Ap post

Bayesia Iferece: Normal Sample Case2: Ukow precisio ξ ad kow mea θ 0. Likelihood of sample x = (x 1,..., x ) lik(ξ) = f ( (x 1,..., x θ 0, ξ) = i=1 f (x i θ 0, ξ) ( 1 1 ξ(x i θ 0 ) = ( ξ ) 2 e 2 2 i=1 2π ( ξ ξ (x i θ 0 ) = 2 2π ) 2 e 1 2 i=1 Prior distributio: ξ Gamma(α, λ) λ α ξ α 1 π(ξ) = e λξ, ξ > 0 ( Cojugate Prior) Γ(α) Posterior distributio π(ξ x) l lik(ξ) π(ξ) l = ( ξ ) 2 e 2 i=1 e 1 ξ (x i θ 0 ) 2 λ α ξ (α 1 λξ 2π Γ(α) 1 +α 1 (λ+ (x i θ 0 ) 2 )ξ ξ 2 e 2 i=1 = ξ α 1e λ ξ Gamma(α, λ ) distributio desity with 1 a α = α + ad λ = λ + (x i θ 0 ) 2. 2 2 i=1 15 MIT 18.443 Parameter EstimatioFittig Probability DistributiosBayesia Ap

Bayesia Iferece: Normal Sample Case2: Ukow precisio ξ ad kow mea θ 0 (cotiued) Posterior distributio π(ξ x) lik(ξ) π(ξ) 1 +α 1 (λ+ (x i θ 0 ) 2 )ξ ξ 2 e 2 i=1 = ξ α 1e λ ξ Gamma(α, λ ) distributio desity with 1 a α = α + 2 ad λ = λ + 2 i=1(x i θ 0 ) 2. Posterior mea: E [ξ x] = α λ Posterior mode: mode(π(ξ x)) = α 1 λ For small α ad λ, E [ξ x] = 1/σˆ2 (x i θ 0 ) 2 MLE i=1 2 σ 2 i=1 (x i θ 0 ) 2 MLE mode(π(ξ x)) = (1 2 )/ˆ 16 MIT 18.443 Parameter EstimatioFittig Probability DistributiosBayesia Ap

Bayesia Iferece: Normal Sample Case3: Ukow mea θ ad ukow precisio ξ Likelihood of sample x = (x 1,..., x ) lik(θ, ξ) = f ( (x 1,..., x θ, ξ) = i=1 f (x i θ, ξ) ( 1 1 ξ(x i θ) = ( ξ ) 2 e 2 2 i=1 2π ( ξ 1 ξ (x i θ) = 2 2π ) 2 e 2 i=1 Prior distributio: θ ad ξ idepedet, a priori with θ N(θ 0, ξ 1 prior ) ξ Gamma(α, λ) π(θ, ξ) = π(θ)π(ξ) l ] l ] ξ 1 λ α ξ α 1 prior 1 ξ prior (θ θ 0 ) = ( ) 2 e 2 2 e λξ 2π Γ(α) Posterior distributio π(θ, ξ x) lik(θ, l ξ) π(θ, ξ) l 1 ] ] ξ i=1 (x i θ) 2 1 ξ prior (θ θ 0 ) (ξ) 2 e 2 e 2 2 e λξ ξ α 1 17 MIT 18.443 Parameter EstimatioFittig Probability DistributiosBayesia Ap

Bayesia Iferece: Normal Sample Case 3 Posterior distributio π(θ, ξ x) lik(θ, l ξ) π(θ, ξ) ] l (ξ) 2 e 2 i=1 e 2 1 ξ (x i θ) 2 1 ξ prior (θ θ 0 ) 2 ξ α 1 λξ e Margial Posterior distributio X of θ : π(θ x) = l {ξ} π(θ, ξ x)dξ ] X e 1 2 ξ [ prior (θ θ λ 0 ) 2 (ξ) α 1 ] e ξ {ξ} dξ l ] 1 ξ Γ(α prior (θ θ 0 ) 2 ) = e 2 (λ ) a α where α = α + ad λ = λ + 1 2 2 i=1(x i θ) 2. Limitig case as ξ prior, α ad λ 0 a π(θ x) (λ ) α = [ (x i θ) 2 ] 2 i=1 = [( 1)s 2 + (θ x) 2 ] 2 1 (θ x) 2 ] 1 s 2 [1 + 2 Note: A posteriori (θ x)/s t 1 (for small ξ prior, α, λ) ] 18 MIT 18.443 Parameter EstimatioFittig Probability DistributiosBayesia Ap

Bayesia Iferece: Poisso Distributio Poisso Sample X 1, X 2,..., X i.i.d. Poisso(λ) Sample Space: X = {0, 1, 2,...} (for each X i ) Probability mass fuctio: λ x λ f (x λ) = e x! Likelihood of sample x = (x 1,..., x ) lik(λ) = f (x 1,..., x λ) ( ( λ x i = f (x λ i=1 i λ) = i=1 x i! e λ 1 x i e λ Prior distributio: λ Gamma(α, ν) π(λ) = να λ α 1 e νλ Γ(α), λ > 0 Posterior distributio l ] l ] 1 x i ν α λ π(λ x) lik(λ) π(λ) = λ e λ α 1 e νλ λ α 1 ν λ Γ(α) e a Gamma(α, ν ) with α = α + 1 x i ad ν = ν +. 19 MIT 18.443 Parameter EstimatioFittig Probability DistributiosBayesia Ap

Bayesia Iferece: Poisso Distributio Specifyig the prior distributio: λ Gamma(α, ν). Choose α ad ν to match prior mea ad prior variace E [λ α, ν] = α/ν (= µ 1 ) Var[λ α, ν] = α/ν 2 (= σ 2 = µ 2 2 µ 1 ) Set ν = µ 1 /σ 2 ad α = µ 1 ν Cosider uiform distributio o iterval [0, λ MAX ] = {λ : 0 < λ < λ MAX } (Choose λ MAX to be very large) Example 8.4.A Couts of asbestos fibers o filters (Steel et al. 1980). 23 grid squares with mea cout: x = 1 a 2 23 i=1 3x i = 24.9. λˆ MOM = λˆ MLE E= 24.9 E StError(λˆ) = Var(ˆ V λ) = λ/ ˆ = 24.9/23 = 1.04 Compare with Bayesia Iferece (µ 1 = 15 ad σ 2 = 5 2 ) 20 MIT 18.443 Parameter EstimatioFittig Probability DistributiosBayesia Ap

Bayesia Iferece: Hardy-Weiberg Model Example 8.5.1 A / 8.6 C Multiomial sample Data: couts of multiomial cells, (X 1, X 2, X 3 ) = (342, 500, 187), for = 1029 outcomes correspodig to geotypes AA, Aa ad aa which occur with probabilities: (1 θ) 2, 2θ(1 θ) ad θ 2. Prior for θ : Uiform distributio o (0, 1) = {θ : 0 < θ < 1}. Bayes predictive iterval for θ agrees with approximate cofidece iterval based o θˆ = 0.4247. See R Script implemetig the Bayesia computatios. 21 MIT 18.443 Parameter EstimatioFittig Probability DistributiosBayesia Ap

Bayesia Iferece: Prior Distributios Importat Cocepts Cojugate Prior Distributio: a prior distributio from a distriibutio family for which the posterior distributio is from the same distributio family Beta distributios for Beroulli/Biomial Samples Gamma distributios for Poisso Samples Normal distributios for Normal Samples (ukow mea, kow variace) No-iformative Prior Distributios: Prior distributios that let the data domiate the structure of the posterior distributio. Uiform/Flat prior Complicated by choice of scale/uits for parameter No-iformative prior desity may ot itegrate to 1 I.e., prior distributio is improper Posterior distributio for improper priors correspods to limitig case of sequece of proper priors. 22 MIT 18.443 Parameter EstimatioFittig Probability DistributiosBayesia Ap

Bayesia Iferece: Normal Approximatio to Posterior Posterior Distributio With Large-Samples Coditioal desity/pmf of data: X f (x θ) Prior desity of parameter: θ π(θ) Posterior desity π(θ x) π(θ)f (x θ) = exp [log π(θ)] exp [log f (x θ)] = exp [log π(θ)] exp [ (θ)] For a large sample, (θ) ca be expressed as a Taylor Series about the MLE θˆ (θ) = (θˆ) + (θ θˆ) / (θˆ) + 1 (θ θ) ˆ 2 2 // (ˆ θ) (θ θˆ) 0 + 1 (θ θ) ˆ 2 2 // (ˆ θ) 1 = (θ θ) ˆ 2 2 // (ˆ θ) (i.e. Normal log-likelihood, mea θˆ ad variace [ (θˆ)] 1 ) For large sample, π(θ) is relatively flat i rage ear θ θˆ ad likelihood cocetrates i same rage. 23 MIT 18.443 Parameter EstimatioFittig Probability DistributiosBayesia Ap

MIT OpeCourseWare http://ocw.mit.edu 18.443 Statistics for Applicatios Sprig 2015 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.