Introduction to Bayesian Statistics

Σχετικά έγγραφα
Solution Series 9. i=1 x i and i=1 x i.

Other Test Constructions: Likelihood Ratio & Bayes Tests

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

Statistical Inference I Locally most powerful tests

5.4 The Poisson Distribution.

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review

Μηχανική Μάθηση Hypothesis Testing

HOMEWORK#1. t E(x) = 1 λ = (b) Find the median lifetime of a randomly selected light bulb. Answer:

ST5224: Advanced Statistical Theory II

Web-based supplementary materials for Bayesian Quantile Regression for Ordinal Longitudinal Data

FORMULAS FOR STATISTICS 1


Math 6 SL Probability Distributions Practice Test Mark Scheme

Homework 3 Solutions

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

Areas and Lengths in Polar Coordinates

k A = [k, k]( )[a 1, a 2 ] = [ka 1,ka 2 ] 4For the division of two intervals of confidence in R +

Lecture 34 Bootstrap confidence intervals

Approximation of distance between locations on earth given by latitude and longitude

DESIGN OF MACHINERY SOLUTION MANUAL h in h 4 0.

Main source: "Discrete-time systems and computer control" by Α. ΣΚΟΔΡΑΣ ΨΗΦΙΑΚΟΣ ΕΛΕΓΧΟΣ ΔΙΑΛΕΞΗ 4 ΔΙΑΦΑΝΕΙΑ 1

An Inventory of Continuous Distributions

Areas and Lengths in Polar Coordinates

Numerical Analysis FMN011

HW 3 Solutions 1. a) I use the auto.arima R function to search over models using AIC and decide on an ARMA(3,1)

6.3 Forecasting ARMA processes

Aquinas College. Edexcel Mathematical formulae and statistics tables DO NOT WRITE ON THIS BOOKLET

Queensland University of Technology Transport Data Analysis and Modeling Methodologies

Bayesian modeling of inseparable space-time variation in disease risk

6. MAXIMUM LIKELIHOOD ESTIMATION

A Bonus-Malus System as a Markov Set-Chain. Małgorzata Niemiec Warsaw School of Economics Institute of Econometrics

HISTOGRAMS AND PERCENTILES What is the 25 th percentile of a histogram? What is the 50 th percentile for the cigarette histogram?

Problem Set 3: Solutions

Description of the PX-HC algorithm

Jesse Maassen and Mark Lundstrom Purdue University November 25, 2013

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Biostatistics for Health Sciences Review Sheet

2 Composition. Invertible Mappings

Probability and Random Processes (Part II)

ΕΙΣΑΓΩΓΗ ΣΤΗ ΣΤΑΤΙΣΤΙΚΗ ΑΝΑΛΥΣΗ

derivation of the Laplacian from rectangular to spherical coordinates

Exercises to Statistics of Material Fatigue No. 5

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 6/5/2006

EE512: Error Control Coding

Estimation for ARMA Processes with Stable Noise. Matt Calder & Richard A. Davis Colorado State University

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

4.6 Autoregressive Moving Average Model ARMA(1,1)

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ.

Homework 8 Model Solution Section

CE 530 Molecular Simulation

Partial Differential Equations in Biology The boundary element method. March 26, 2013

Figure A.2: MPC and MPCP Age Profiles (estimating ρ, ρ = 2, φ = 0.03)..

Υγιεινή Εγκαταστάσεων Βιομηχανιών Τροφίμων

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

Introduction to the ML Estimation of ARMA processes

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

Practice Exam 2. Conceptual Questions. 1. State a Basic identity and then verify it. (a) Identity: Solution: One identity is csc(θ) = 1

Matrices and Determinants


Notes on the Open Economy

ON NEGATIVE MOMENTS OF CERTAIN DISCRETE DISTRIBUTIONS

Bayesian Data Analysis, Midterm I

The challenges of non-stable predicates

ΕΛΛΗΝΙΚΗ ΔΗΜΟΚΡΑΤΙΑ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΡΗΤΗΣ. Ψηφιακή Οικονομία. Διάλεξη 7η: Consumer Behavior Mαρίνα Μπιτσάκη Τμήμα Επιστήμης Υπολογιστών

Srednicki Chapter 55

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

Instruction Execution Times

Statistics & Research methods. Athanasios Papaioannou University of Thessaly Dept. of PE & Sport Science

5.6 evaluating, checking, comparing Chris Parrish July 3, 2016


Potential Dividers. 46 minutes. 46 marks. Page 1 of 11

SOLVING CUBICS AND QUARTICS BY RADICALS

Supplementary Appendix

ΘΕΩΡΗΤΙΚΗ ΚΑΙ ΠΕΙΡΑΜΑΤΙΚΗ ΙΕΡΕΥΝΗΣΗ ΤΗΣ ΙΕΡΓΑΣΙΑΣ ΣΚΛΗΡΥΝΣΗΣ ΙΑ ΛΕΙΑΝΣΕΩΣ

Additional Results for the Pareto/NBD Model

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

These derivations are not part of the official forthcoming version of Vasilaky and Leonard

Concrete Mathematics Exercises from 30 September 2016

C.S. 430 Assignment 6, Sample Solutions

5.1 logistic regresssion Chris Parrish July 3, 2016

Section 8.3 Trigonometric Equations

Pg The perimeter is P = 3x The area of a triangle is. where b is the base, h is the height. In our case b = x, then the area is

b. Use the parametrization from (a) to compute the area of S a as S a ds. Be sure to substitute for ds!

CRASH COURSE IN PRECALCULUS

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

Models for Probabilistic Programs with an Adversary

172,,,,. P,. Box (1980)P, Guttman (1967)Rubin (1984)P, Meng (1994), Gelman(1996)De la HorraRodriguez-Bernal (2003). BayarriBerger (2000)P P.. : Casell

Inverse trigonometric functions & General Solution of Trigonometric Equations

ΚΥΠΡΙΑΚΟΣ ΣΥΝΔΕΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY 21 ος ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ Δεύτερος Γύρος - 30 Μαρτίου 2011

ESTIMATION OF SYSTEM RELIABILITY IN A TWO COMPONENT STRESS-STRENGTH MODELS DAVID D. HANAGAL

Solutions to Exercise Sheet 5

Physical DB Design. B-Trees Index files can become quite large for large main files Indices on index files are possible.

Strain gauge and rosettes

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 24/3/2007

Conjoint. The Problems of Price Attribute by Conjoint Analysis. Akihiko SHIMAZAKI * Nobuyuki OTAKE

Fourier Series. MATH 211, Calculus II. J. Robert Buchanan. Spring Department of Mathematics

ENGR 691/692 Section 66 (Fall 06): Machine Learning Assigned: August 30 Homework 1: Bayesian Decision Theory (solutions) Due: September 13

Transcript:

Introduction to Bayesian Statistics Lecture 9: Hierarchical Models Rung-Ching Tsai Department of Mathematics National Taiwan Normal University May 6, 2015

Example Data: Weekly weights of 30 young rats (Gelfand, Hills, Racine-Poon, & Smith, 1990). Model: Day 8 15 22 29 36 Rat 1 151 199 246 283 320 Rat 2 145 199 249 293 354 Rat 30 153 200 244 286 324 Y ij = α + βx j + ɛ ij, where Y ij : weight of i-th rat on day x j ; ɛ ij Normal(0, σ 2 ) What is the assumption on the growth of the 30 rats in this model? 2 of 22

Example Data: Number of Failures and length of operation time of 10 power plant pumps (George, Makov, & Smith, 1993). Pump 1 2 3 4 5 6 7 8 9 10 time 94.5 15.7 62.9 126 5.24 31.4 1.05 1.05 2.1 10.5 failure 5 1 5 14 3 19 1 1 4 22 Model: X ij Poisson(λt i ) where X ij is the number of power failures, λ is the failure rate, and t i is the length of operation time of pump i (in 1000s of hours). What is the assumption on the failure rates of the 10 power plant pumps in this model? 3 of 22

Possible problems with above approaches A single (α, β) may be inadequate to fit all the rats. Likewise, a common failure rate for all the power plant pumps may not be suitable. Separate unrelated (α i, β i ) for each rat, or λ i for each pump are likely to overfit the data. Some information about the parameters of one rat or one pump can be obtained from others data. 4 of 22

Motivation for hierarchical models A thought naturally arises by assuming that (α i, β i ) s or λ i s are samples from a common population distribution. The distribution of observed outcomes are conditional on parameters which themselves have a probability specification, known as a hierarchical or multilevel model. The new parameters introduced to govern the population distribution of the parameters are called hyperparameters. Thus, we would need to estimate the parameters governing the population distribution of (α i, β i ) rather than each (α i, β i ) separately. 5 of 22

Bayesian approach to hierarchical models Model specification specify the sampling distribution of data: p(y θ) specify the population distribution of θ: p(θ φ) where φ is the hyperparameter Bayesian estimation specify the prior for hyperparameter: p(φ); Many levels are possible. The hyperprior distribution at highest level is often chosen to be non-informative consider the above model specification: p(y θ) and p(θ φ) find the joint posterior distribution of parameter θ and hyperparameter φ: p(θ, φ y) p(θ, φ)p(y θ, φ) = p(θ, φ)p(y θ) p(φ)p(θ φ)p(y θ) Point and Credible interval estimations for φ and θ Predictive distribution for ỹ 6 of 22

Analytical derivation of conditional/marginal dist. Write put the joint posterior distribution: p(θ, φ y) p(φ)p(θ φ)p(y θ) Determine analytically the conditional posterior density of θ given φ: p(θ φ, y) Obtain the marginal posterior distribution of φ: p(φ y) = p(θ, φ y)dθ or p(φ y) = p(θ, φ y) p(θ φ, y). 7 of 22

Simulations from the posterior distributions 1. Two steps to simulate a random draw from the joint posterior distribution of θ and φ: p(θ, φ y) Draw φ from its marginal posterior distribution: p(φ y) Draw parameter θ from its conditional posterior p(θ φ, y) 2. If desired, draw predictive values ỹ from the posterior predictive distribution given the drawn θ 8 of 22

Example: Rat tumors Goal: Estimating the risk of tumor in a group of rats Data (number of rats developed some kind of tumor): 1. 70 historical experiments: 0/20 0/20 0/20 0/20 0/20 0/20 0/20 0/19 0/19 0/19 0/19 0/18 0/18 0/17 1/20 1/20 1/20 1/20 1/19 1/19 1/18 1/18 2/25 2/24 2/23 2/20 2/20 2/20 2/20 2/20 2/20 1/10 5/49 2/19 5/46 3/27 2/17 7/49 7/47 3/20 3/20 2/13 9/48 10/50 4/20 4/20 4/20 4/20 4/20 4/20 4/20 10/48 4/19 4/19 4/19 5/22 11/46 12/49 5/20 5/20 6/23 5/19 6/22 6/20 6/20 6/20 16/52 15/47 15/46 9/24 2. Current experiment: 4/14 9 of 22

Bayesian approach to hierarchical models Model specification sampling distribution of data: y j binomial(, θ j ), j = 1, 2,, 71. the population distribution of θ: θ j Beta(α, β) where α and β are the hyperparameters. Bayesian estimation non-informative prior for hyperparameters: p(α, β) consider the above model specification: p(θ α, β) find the joint posterior distribution of parameter θ and hyperparameters α and β: p(θ, α, β y) p(α, β)p(θ α, β)p(y θ, α, β) J Γ(α + β) J p(α, β) Γ(α)Γ(β) θα 1 j (1 θ j ) β 1 θ y i j (1 θ j ) y j 10 of 22

Analytical derivation of conditional/marginal dist. the joint posterior distribution: p(θ, α, β y) p(α, β) J Γ(α + β) Γ(α)Γ(β) θα 1 j (1 θ j ) β 1 the conditional posterior density of θ given α and β: p(θ α, β, y) = J θ y i j (1 θ j ) y j J Γ(α + β + ) Γ(α + y j )Γ(β + y j ) θα+y j 1 j (1 θ j ) β+ y j 1 the marginal posterior distribution of α and β: p(α, β y) = 11 of 22 p(θ, α, β y) J p(θ α, β, y) p(α, β) Γ(α + β) Γ(α + y j )Γ(β + y j ) Γ(α)Γ(β) Γ(α + β + )

Choice of hyperprior distribution Idea: To set up a non-informative hyperprior distribution ( ) p logit( α α+β ) = log( α β ), log(α + β) 1 NO( GOOD because ) it leads to improper posterior. α p α+β, α + β 1 or p(α, β) 1 NO GOOD because the posterior density is not integrable in the limit. ( ) α p, (α + β) 1/2 1 p(α, β) (α + β) 5/2 α + β p (log( αβ ) ), log(α + β) αβ(α + β) 5/2 OK because it leads to proper posterior. 12 of 22

Computing marginal posterior of the hyperparameters Computing the relative (unnormalized) posterior density on a grid of values that cover the ) effective range of (α, β) (log( αβ ), log(α + β) [ 1, 2.5] [1.5, 3] ) (log( αβ ), log(α + β) [ 1.3, 2.3] [1, 5] Drawing contour plot ) of the marginal density of (log( α β ), log(α + β) contour lines are at 0.05, 0.15,, 0.95 times the density at the mode. Normalizing by approximating the posterior distribution as a step function over a grid and setting total probability in the grid to 1. Computing the posterior moments based on the grid of (log( α β ), log(α + β)). For example, E(α y) is estimated by α = αp(log( ), log(α + β) y) β log( α β ),log(α+β) 13 of 22

Sampling from the joint posterior 1. Simulation 1000 draws of (log( α β ), log(α + β)) from their posterior distribution using the discrete-grid sampling procedure. 2. For l = 1,, 1000 Transform the l-th draw of (log( α β ), log(α + β)) to the scale of (α, β) to yield a draw of the hyperparameters from their marginal posterior distribution. For each j = 1,, J, sample θ j from its conditional posterior distribution θ j α, β, y Beta(α + y j, β + y j ). 14 of 22

Displaying the results Plot the posterior means and 95% intervals for the θ j s (Figure 5.4 on page 131) Rate θ j s are shrunk from their sample point estimates, y j, towards the population distribution, with approximate mean. Experiment with few observation are shrunk more and have higher posterior variances. Note that posterior variability is higher in the full Bayesian analysis, reflecting posterior uncertainty in the hyperparameters. 15 of 22

Hierarchical normal models (I) Model specification Sampling distribution of data: y ij θ j Normal(θ j, σ 2 ), i = 1,,, j = 1, 2,, J. σ 2 known the population distribution of θ: θ j Normal(µ, τ 2 ) where µ and τ are the hyperparameters. That is, J p(θ 1,, θ J µ, τ) = N(θ j µ, τ 2 ) J p(θ 1,, θ J ) = [N(θ j µ, τ 2 )]p(µ, τ)d(µ, τ). 16 of 22

Hierarchical normal models (II) Bayesian estimation non-informative prior for hyperparameters: p(µ, τ) = p(µ τ)p(τ) p(τ) consider the above model specification: p(θ µ, τ) find the joint posterior distribution of parameter θ and hyperparameters µ and τ: p(θ, µ, τ y) p(µ, τ)p(θ µ, τ)p(y θ) J J p(µ, τ) N(θ j µ, τ 2 ) N(ȳ.j θ j, σ 2 / ) 17 of 22

Conditional posterior of θ given (µ, τ), p(θ µ, τ, y) where θ j µ, τ Normal(µ, τ 2 ), θ j µ, τ, y Normal(ˆθ j, V j ), ˆθ j = V j = σ ȳ 2.j + 1 τ µ 2 σ + 1 2 τ 2 1 σ 2 + 1 τ 2 18 of 22

Marginal posterior of µ and τ, p(µ, τ y) Therefore, p(µ, τ y) p(µ, τ)p(y µ, τ) ȳ.j µ, τ Normal(µ, σ2 + τ 2 ) p(µ, τ y) p(µ, τ) J N(ȳ.j µ, σ2 + τ 2 ) 19 of 22

Posterior of µ given τ, p(µ τ, y) Therefore, p(µ, τ y) = p(µ τ, y)p(τ y) p(µ τ, y) = p(µ, τ y) p(τ y) µ τ, y Normal(ˆµ, V µ ), where ˆµ = J J 1 σ 2 +τ 2 ȳ.j 1 σ 2 +τ 2 1 and Vµ = J 1 σ 2 + τ 2 20 of 22

Posterior distribution of τ, p(τ y) p(τ y) = p(µ, τ y) p(µ τ, y p(τ) J N(ȳ.j µ, σ2 + τ 2 ) N(µ ˆµ, V µ ) p(τ) J N(ȳ.j ˆµ, σ2 + τ 2 ) N(ˆµ ˆµ, V µ ) p(τ)v 1/2 µ J + τ 2 ) 1/2 exp (ȳ.j ˆµ) 2 2( σ2 + τ 2 ) ( σ2 21 of 22

Prior distribution of τ, p(τ) p(τ y) = p(µ, τ y) p(µ τ, y p(τ) J N(ȳ.j µ, σ2 + τ 2 ) N(µ ˆµ, V µ ) p(τ) J N(ȳ.j ˆµ, σ2 + τ 2 ) N(ˆµ ˆµ, V µ ) p(τ)v 1/2 µ J + τ 2 ) 1/2 exp (ȳ.j ˆµ) 2 2( σ2 + τ 2 ) ( σ2 22 of 22