Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

Σχετικά έγγραφα
Solution Series 9. i=1 x i and i=1 x i.

Statistical Inference I Locally most powerful tests

Other Test Constructions: Likelihood Ratio & Bayes Tests

5.4 The Poisson Distribution.

ST5224: Advanced Statistical Theory II

Probability and Random Processes (Part II)

Homework 3 Solutions

Μηχανική Μάθηση Hypothesis Testing

2 Composition. Invertible Mappings

The Simply Typed Lambda Calculus

Estimation for ARMA Processes with Stable Noise. Matt Calder & Richard A. Davis Colorado State University

Solutions to Exercise Sheet 5

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

An Inventory of Continuous Distributions

Web-based supplementary materials for Bayesian Quantile Regression for Ordinal Longitudinal Data

Exercises to Statistics of Material Fatigue No. 5

An Introduction to Signal Detection and Estimation - Second Edition Chapter II: Selected Solutions

Problem Set 3: Solutions

Bayesian modeling of inseparable space-time variation in disease risk

EE512: Error Control Coding

4.6 Autoregressive Moving Average Model ARMA(1,1)

Υπολογιστική Φυσική Στοιχειωδών Σωματιδίων

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Elements of Information Theory

6. MAXIMUM LIKELIHOOD ESTIMATION

6.3 Forecasting ARMA processes

w o = R 1 p. (1) R = p =. = 1

HW 3 Solutions 1. a) I use the auto.arima R function to search over models using AIC and decide on an ARMA(3,1)

APPENDICES APPENDIX A. STATISTICAL TABLES AND CHARTS 651 APPENDIX B. BIBLIOGRAPHY 677 APPENDIX C. ANSWERS TO SELECTED EXERCISES 679

2. THEORY OF EQUATIONS. PREVIOUS EAMCET Bits.

FORMULAS FOR STATISTICS 1

Supplementary Appendix

Jesse Maassen and Mark Lundstrom Purdue University November 25, 2013


: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Homework for 1/27 Due 2/5

Inverse trigonometric functions & General Solution of Trigonometric Equations

k A = [k, k]( )[a 1, a 2 ] = [ka 1,ka 2 ] 4For the division of two intervals of confidence in R +

lecture 10: the em algorithm (contd)

Homework 8 Model Solution Section

Aquinas College. Edexcel Mathematical formulae and statistics tables DO NOT WRITE ON THIS BOOKLET

Second Order Partial Differential Equations

Math 6 SL Probability Distributions Practice Test Mark Scheme

A Bonus-Malus System as a Markov Set-Chain. Małgorzata Niemiec Warsaw School of Economics Institute of Econometrics

Lecture 21: Properties and robustness of LSE

Main source: "Discrete-time systems and computer control" by Α. ΣΚΟΔΡΑΣ ΨΗΦΙΑΚΟΣ ΕΛΕΓΧΟΣ ΔΙΑΛΕΞΗ 4 ΔΙΑΦΑΝΕΙΑ 1

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

HMY 795: Αναγνώριση Προτύπων. Διάλεξη 2

Introduction to the ML Estimation of ARMA processes

HMY 795: Αναγνώριση Προτύπων

Queensland University of Technology Transport Data Analysis and Modeling Methodologies

Repeated measures Επαναληπτικές μετρήσεις

Section 8.3 Trigonometric Equations

Fractional Colorings and Zykov Products of graphs

ω ω ω ω ω ω+2 ω ω+2 + ω ω ω ω+2 + ω ω+1 ω ω+2 2 ω ω ω ω ω ω ω ω+1 ω ω2 ω ω2 + ω ω ω2 + ω ω ω ω2 + ω ω+1 ω ω2 + ω ω+1 + ω ω ω ω2 + ω

HOMEWORK#1. t E(x) = 1 λ = (b) Find the median lifetime of a randomly selected light bulb. Answer:

Lecture 12: Pseudo likelihood approach

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

Models for Probabilistic Programs with an Adversary

Matrices and Determinants

Biostatistics for Health Sciences Review Sheet

ON NEGATIVE MOMENTS OF CERTAIN DISCRETE DISTRIBUTIONS

On the general understanding of the empirical Bayes method

Theorem 8 Let φ be the most powerful size α test of H

Additional Results for the Pareto/NBD Model

Πρόβλημα 1: Αναζήτηση Ελάχιστης/Μέγιστης Τιμής

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

Modern Bayesian Statistics Part III: high-dimensional modeling Example 3: Sparse and time-varying covariance modeling

Second Order RLC Filters

Part III - Pricing A Down-And-Out Call Option

Every set of first-order formulas is equivalent to an independent set

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

Modbus basic setup notes for IO-Link AL1xxx Master Block

Exercise 2: The form of the generalized likelihood ratio

Gaussian related distributions

ΕΙΣΑΓΩΓΗ ΣΤΗ ΣΤΑΤΙΣΤΙΚΗ ΑΝΑΛΥΣΗ

1. A fully continuous 20-payment years, 30-year term life insurance of 2000 is issued to (35). You are given n A 1

C.S. 430 Assignment 6, Sample Solutions

HMY 795: Αναγνώριση Προτύπων

b. Use the parametrization from (a) to compute the area of S a as S a ds. Be sure to substitute for ds!

Asymptotic distribution of MLE

Example of the Baum-Welch Algorithm

Srednicki Chapter 55

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

ANSWERSHEET (TOPIC = DIFFERENTIAL CALCULUS) COLLECTION #2. h 0 h h 0 h h 0 ( ) g k = g 0 + g 1 + g g 2009 =?

Math221: HW# 1 solutions

Optimal Impartial Selection

Approximation of distance between locations on earth given by latitude and longitude

PHOS π 0 analysis, for production, R AA, and Flow analysis, LHC11h

Section 7.6 Double and Half Angle Formulas

These derivations are not part of the official forthcoming version of Vasilaky and Leonard

[1] P Q. Fig. 3.1

MATHACHij = γ00 + u0j + rij

Parametrized Surfaces

Exercises 10. Find a fundamental matrix of the given system of equations. Also find the fundamental matrix Φ(t) satisfying Φ(0) = I. 1.

Instruction Execution Times

Transcript:

Bayesian statistics DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda

Frequentist vs Bayesian statistics In frequentist statistics the data are modeled as realizations from a distribution that depends on deterministic parameters In Bayesian statistics the parameters are modeled as random variables This allows to quantify our prior uncertainty and incorporate additional information

Learning Bayesian models Conjugate priors Bayesian estimators

Prior distribution and likelihood The data x R n are a realization of a random vector X, which depends on a vector of parameters Θ Modeling choices: Prior distribution: Distribution of Θ encoding our uncertainty about the model before seeing the data Likelihood: Conditional distribution of X given Θ

Posterior distribution The posterior distribution is the conditional distribution of Θ given X Evaluating the posterior at the data x allows to update our uncertainty about Θ using the data

Bernoulli distribution Goal: Estimating Bernoulli parameter from iid data We consider two different Bayesian estimators Θ 1 and Θ 2 : 1. Θ 1 is a conservative estimator with a uniform prior pdf { 1 for 0 θ 1 f Θ1 (θ) = 0 otherwise 2. Θ 2 has a prior pdf skewed towards 1 { 2 θ for 0 θ 1 f Θ2 (θ) = 0 otherwise

Prior distributions 2.0 1.5 1.0 0.5 0.0 0.0 0.2 0.4 0.6 0.8 1.0

Bernoulli distribution: likelihood The data are assumed to be iid, so the likelihood is p X Θ ( x θ)

Bernoulli distribution: likelihood The data are assumed to be iid, so the likelihood is p X Θ ( x θ) = θ n 1 (1 θ) n 0 n 0 is the number of zeros and n 1 the number of ones

Bernoulli distribution: posterior distribution f Θ1 X (θ x)

Bernoulli distribution: posterior distribution f Θ1 X (θ x) = f Θ 1 (θ) p X Θ1 ( x θ) p X ( x)

Bernoulli distribution: posterior distribution f Θ1 X (θ x) = f Θ 1 (θ) p X Θ1 ( x θ) p X ( x) f Θ1 (θ) p X Θ1 ( x θ) = u f Θ 1 (u) p X Θ1 ( x u) du

Bernoulli distribution: posterior distribution f Θ1 X (θ x) = f Θ 1 (θ) p X Θ1 ( x θ) p X ( x) f Θ1 (θ) p X Θ1 ( x θ) = u f Θ 1 (u) p X Θ1 ( x u) du θ n 1 (1 θ) n 0 = u un 1 (1 u) n 0 du

Bernoulli distribution: posterior distribution f Θ1 X (θ x) = f Θ 1 (θ) p X Θ1 ( x θ) p X ( x) f Θ1 (θ) p X Θ1 ( x θ) = u f Θ 1 (u) p X Θ1 ( x u) du θ n 1 (1 θ) n 0 = u un 1 (1 u) n 0 du = θn 1 (1 θ) n 0 β (n 1 + 1, n 0 + 1) β (a, b) := u a 1 (1 u) b 1 du u

Bernoulli distribution: posterior distribution f Θ2 X (θ x)

Bernoulli distribution: posterior distribution f Θ2 X (θ x) = f Θ 2 (θ) p X Θ2 ( x θ) p X ( x)

Bernoulli distribution: posterior distribution f Θ2 X (θ x) = f Θ 2 (θ) p X Θ2 ( x θ) p X ( x) θ n1+1 (1 θ) n 0 = u un1+1 (1 u) n 0 du

Bernoulli distribution: posterior distribution f Θ2 X (θ x) = f Θ 2 (θ) p X Θ2 ( x θ) p X ( x) θ n1+1 (1 θ) n 0 = u un1+1 (1 u) n 0 du = θn 1+1 (1 θ) n 0 β (n 1 + 2, n 0 + 1) β (a, b) := u a 1 (1 u) b 1 du u

Bernoulli distribution: n 0 = 1, n 1 = 3 2.5 2.0 1.5 1.0 0.5 0.0 0.0 0.2 0.4 0.6 0.8 1.0

Bernoulli distribution: n 0 = 3, n 1 = 1 2.0 1.5 1.0 0.5 0.0 0.0 0.2 0.4 0.6 0.8 1.0

Bernoulli distribution: n 0 = 91, n 1 = 9 14 12 Posterior mean (uniform prior) Posterior mean (skewed prior) ML estimator 10 8 6 4 2 0 0.0 0.2 0.4 0.6 0.8 1.0

Learning Bayesian models Conjugate priors Bayesian estimators

Beta random variable Useful in Bayesian statistics Unimodal continuous distribution in the unit interval The pdf of a beta distribution with parameters a and b is defined as f β (θ; a, b) := { θ a 1 (1 θ) b 1 β(a,b), if 0 θ 1, 0 otherwise β (a, b) := u a 1 (1 u) b 1 du u

Beta random variables fx (x) 6 4 2 a = 1 b = 1 a = 1 b = 2 a = 3 b = 3 a = 6 b = 2 a = 3 b = 15 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x

Learning a Bernoulli distribution The first prior is beta with parameters a = 1 and b = 1 The second prior is beta with parameters a = 2 and b = 1 The posteriors are beta with parameters a = n 1 + 1, b = n 0 + 1 and a = n 1 + 2, b = n 0 + 1 respectively

Conjugate priors A conjugate family of distributions for a certain likelihood satisfies the following property: If the prior belongs to the family, the posterior also belongs to the family Beta distributions are conjugate priors when the likelihood is binomial

The beta distribution is conjugate to the binomial likelihood Θ is beta with parameters a and b X is binomial with parameters n and Θ f Θ X (θ x)

The beta distribution is conjugate to the binomial likelihood Θ is beta with parameters a and b X is binomial with parameters n and Θ f Θ X (θ x) = f Θ (θ) p X Θ (x θ) p X (x)

The beta distribution is conjugate to the binomial likelihood Θ is beta with parameters a and b X is binomial with parameters n and Θ f Θ X (θ x) = f Θ (θ) p X Θ (x θ) p X (x) f Θ (θ) p X Θ (x θ) = u f Θ (u) p X Θ (x u) du

The beta distribution is conjugate to the binomial likelihood Θ is beta with parameters a and b X is binomial with parameters n and Θ f Θ X (θ x) = f Θ (θ) p X Θ (x θ) p X (x) f Θ (θ) p X Θ (x θ) = u f Θ (u) p X Θ (x u) du θ a 1 (1 θ) b 1 ( ) n x θ x (1 θ) n x = u ua 1 (1 u) b 1 ( n x) u x (1 u) n x du

The beta distribution is conjugate to the binomial likelihood Θ is beta with parameters a and b X is binomial with parameters n and Θ f Θ X (θ x) = f Θ (θ) p X Θ (x θ) p X (x) f Θ (θ) p X Θ (x θ) = u f Θ (u) p X Θ (x u) du θ a 1 (1 θ) b 1 ( ) n x θ x (1 θ) n x = u ua 1 (1 u) b 1 ( n x) u x (1 u) n x du θ x+a 1 (1 θ) n x+b 1 = u ux+a 1 (1 u) n x+b 1 du

The beta distribution is conjugate to the binomial likelihood Θ is beta with parameters a and b X is binomial with parameters n and Θ f Θ X (θ x) = f Θ (θ) p X Θ (x θ) p X (x) f Θ (θ) p X Θ (x θ) = u f Θ (u) p X Θ (x u) du θ a 1 (1 θ) b 1 ( ) n x θ x (1 θ) n x = u ua 1 (1 u) b 1 ( n x) u x (1 u) n x du θ x+a 1 (1 θ) n x+b 1 = u ux+a 1 (1 u) n x+b 1 du = f β (θ; x + a, n x + b)

Poll in New Mexico 429 participants, 227 people intend to vote for Clinton and 202 for Trump Probability that Trump wins in New Mexico? Assumptions: Fraction of Trump voters is modeled as a random variable Θ Poll participants are selected uniformly at random with replacement Number of Trump voters in the poll is binomial with parameters n = 449 and p = Θ

Poll in New Mexico Prior is uniform, so beta with parameters a = 1 and b = 1 Likelihood is binomial Posterior is beta with parameters a = 202 + 1 and b = 227 + 1 The probability that Trump wins in New Mexico is the probability that Θ given the data is greater than 0.5

Poll in New Mexico 18 16 14 88.6% 11.4% 12 10 8 6 4 2 0 0.35 0.40 0.45 0.50 0.55 0.60

Learning Bayesian models Conjugate priors Bayesian estimators

Bayesian estimators What estimator should we use? Two main options: The posterior mean The posterior mode

Posterior mean Mean of the posterior distribution θ MMSE ( x) := E ( Θ X = x ) Minimum mean-square-error (MMSE) estimate For any arbitrary estimator θ other ( x), ( ( E θ other ( X ) Θ ) ) ( 2 ( E θ MMSE ( X ) Θ ) ) 2

Posterior mean ( ( E θ other ( X ) Θ ) ) 2 X = x

Posterior mean ( ( E θ other ( X ) Θ ) ) 2 X = x ( ( = E θ other ( X ) θ MMSE ( X ) + θ MMSE ( X ) Θ ) 2 ) X = x

Posterior mean ( ( E θ other ( X ) Θ ) ) 2 X = x ( ( = E θ other ( X ) θ MMSE ( X ) + θ MMSE ( X ) Θ ) 2 ) X = x ( ( = (θ other ( x) θ MMSE ( x)) 2 + E θ MMSE ( X ) Θ ) 2 ) X = x ( ( )) + 2 (θ other ( x) θ MMSE ( x)) E θ MMSE ( x) E Θ X = x

Posterior mean ( ( E θ other ( X ) Θ ) ) 2 X = x ( ( = E θ other ( X ) θ MMSE ( X ) + θ MMSE ( X ) Θ ) 2 ) X = x ( ( = (θ other ( x) θ MMSE ( x)) 2 + E θ MMSE ( X ) Θ ) 2 ) X = x ( ( )) + 2 (θ other ( x) θ MMSE ( x)) E θ MMSE ( x) E Θ X = x ( ( = (θ other ( x) θ MMSE ( x)) 2 + E θ MMSE ( X ) Θ ) 2 ) X = x

Posterior mean By iterated expectation, ( ( E θ other ( X ) ) 2 ) Θ ( ( ( = E E θ other ( X ) Θ ) )) 2 X

Posterior mean By iterated expectation, ( ( E θ other ( X ) ) 2 ) Θ ( ( ( = E E θ other ( X ) Θ ) )) 2 X ( ( = E θ other ( X ) θ MMSE ( X ) ) ( 2 ( ( ) + E E θ MMSE ( X ) Θ ) 2 ) ) X

Posterior mean By iterated expectation, ( ( E θ other ( X ) ) 2 ) Θ ( ( ( = E E θ other ( X ) Θ ) )) 2 X ( ( = E θ other ( X ) θ MMSE ( X ) ) ( 2 ( ( ) + E E θ MMSE ( X ) Θ ) 2 ) ) X ( ( = E θ other ( X ) θ MMSE ( X ) ) ( 2 ( ) + E θ MMSE ( X ) Θ ) ) 2

Posterior mean By iterated expectation, ( ( E θ other ( X ) ) 2 ) Θ ( ( ( = E E θ other ( X ) Θ ) )) 2 X ( ( = E θ other ( X ) θ MMSE ( X ) ) ( 2 ( ( ) + E E θ MMSE ( X ) Θ ) 2 ) ) X ( ( = E θ other ( X ) θ MMSE ( X ) ) ( 2 ( ) + E θ MMSE ( X ) Θ ) ) 2 ( ( E θ MMSE ( X ) Θ ) ) 2

Bernoulli distribution: n 0 = 1, n 1 = 3 2.5 2.0 1.5 1.0 0.5 0.0 0.0 0.2 0.4 0.6 0.8 1.0

Bernoulli distribution: n 0 = 3, n 1 = 1 2.0 1.5 1.0 0.5 0.0 0.0 0.2 0.4 0.6 0.8 1.0

Bernoulli distribution: n 0 = 91, n 1 = 9 14 12 Posterior mean (uniform prior) Posterior mean (skewed prior) ML estimator 10 8 6 4 2 0 0.0 0.2 0.4 0.6 0.8 1.0

Posterior mode The maximum-a-posteriori (MAP) estimator is the mode of the posterior distribution ( ) θ MAP ( x) := arg max p Θ X θ x θ if Θ is discrete and if Θ is continuous ( ) θ MAP ( x) := arg max f Θ X θ x θ

Maximum-likelihood estimator If the prior is uniform the ML estimator coincides with the MAP estimator ( ) arg max f Θ X θ x θ

Maximum-likelihood estimator If the prior is uniform the ML estimator coincides with the MAP estimator ( ) arg max f Θ X θ x = arg max θ θ ( ) f Θ θ f X Θ ( x θ ) u f Θ (u) f X Θ ( x u) du

Maximum-likelihood estimator If the prior is uniform the ML estimator coincides with the MAP estimator ( ) arg max f Θ X θ x = arg max θ θ = arg max f X Θ ( x θ θ ( ) f Θ θ f X Θ ( x θ ) u f Θ (u) f X Θ ( x u) du )

Maximum-likelihood estimator If the prior is uniform the ML estimator coincides with the MAP estimator ( ) arg max f Θ X θ x = arg max θ θ = arg max f X Θ ( x θ θ ( ) = arg max L x θ θ ( ) f Θ θ f X Θ ( x θ ) u f Θ (u) f X Θ ( x u) du )

Maximum-likelihood estimator If the prior is uniform the ML estimator coincides with the MAP estimator ( ) arg max f Θ X θ x = arg max θ θ = arg max f X Θ ( x θ θ ( ) = arg max L x θ θ ( ) f Θ θ f X Θ ( x θ ) u f Θ (u) f X Θ ( x u) du ) Uniform priors are only well defined over bounded domains

Probability of error If Θ is discrete, MAP estimator minimizes the probability of error For any arbitrary estimator θ other ( x) ( P θ other ( X ) Θ ) ( P θ MAP ( X ) Θ )

Probability of error ( P Θ = θ other ( X ) )

Probability of error ( P Θ = θ other ( X ) ( ) = f X ( x) P Θ = θ other ( x) ) X = x d x x

Probability of error ( P Θ = θ other ( X ) ) = x = x ( f X ( x) P Θ = θ other ( x) X ) = x d x f X ( x) p Θ X (θ other ( x) x) d x

Probability of error ( P Θ = θ other ( X ) ) = x = x x ( f X ( x) P Θ = θ other ( x) X ) = x d x f X ( x) p Θ X (θ other ( x) x) d x f X ( x) p Θ X (θ MAP ( x) x) d x

Probability of error ( P Θ = θ other ( X ) ) = x = x ( f X ( x) P Θ = θ other ( x) X ) = x d x f X ( x) p Θ X (θ other ( x) x) d x f X ( x) p Θ X (θ MAP ( x) x) d x x ( = P Θ = θ MAP ( X ) )

Sending bits Model for communication channel: signal Θ encodes a single bit Prior knowledge indicates that a 0 is 3 times more likely than a 1 p Θ (1) = 1 4, p Θ (0) = 3 4. The channel is noisy, so we send the signal n times At the receptor we observe X i = Θ + Z i, 1 i n, where Z is iid standard Gaussian

Sending bits: ML estimator The likelihood is equal to L x (θ) = The log-likelihood is equal to = n f Xi Θ ( x i θ) i=1 n i=1 1 e ( x i θ)2 2 2π n ( x i θ) 2 log L x (θ) = 2 i=1 n log 2π 2

Sending bits: ML estimator θ ML ( x) = 1 if log L x (1) = n i=1 n i=1 = log L x (0) x i 2 2 x i + 1 n log 2π 2 2 x i 2 2 n log 2π 2 Equivalently, θ ML ( x) = { 1 if 1 n n i=1 x i > 1 2 0 otherwise

Sending bits: ML estimator The probability of error is ( P Θ θ ML ( X ) )

Sending bits: ML estimator The probability of error is ( P Θ θ ML ( X ) ) (Θ θ ML ( X ) ) Θ = 0 P (Θ = 0) + P = P (Θ θ ML ( X ) Θ = 1 ) P (Θ = 1)

Sending bits: ML estimator The probability of error is ( P Θ θ ML ( X ) ) = P (Θ θ ML ( X ) ) Θ = 0 P (Θ = 0) + P (Θ θ ML ( X ) ) Θ = 1 P (Θ = 1) ( 1 n = P x i > 1 ) ( n 2 Θ = 0 1 n P (Θ = 0) + P x i < 1 ) n 2 Θ = 1 P (Θ = 1) i=1 i=1

Sending bits: ML estimator The probability of error is ( P Θ θ ML ( X ) ) = P (Θ θ ML ( X ) ) Θ = 0 P (Θ = 0) + P (Θ θ ML ( X ) ) Θ = 1 P (Θ = 1) ( 1 n = P x i > 1 ) ( n 2 Θ = 0 1 n P (Θ = 0) + P x i < 1 ) n 2 Θ = 1 P (Θ = 1) i=1 i=1 = Q ( n/2 )

Sending bits: MAP estimator The logarithm of the posterior is equal to log p Θ X (θ x)

Sending bits: MAP estimator The logarithm of the posterior is equal to n i=1 log p Θ X (θ x) = log f Xi Θ ( x i θ) p Θ (θ) f X ( x)

Sending bits: MAP estimator The logarithm of the posterior is equal to n i=1 log p Θ X (θ x) = log f Xi Θ ( x i θ) p Θ (θ) f X ( x) n = log f Xi Θ ( x i θ) p Θ (θ) log f X ( x) i=1

Sending bits: MAP estimator The logarithm of the posterior is equal to n i=1 log p Θ X (θ x) = log f Xi Θ ( x i θ) p Θ (θ) f X ( x) n = log f Xi Θ ( x i θ) p Θ (θ) log f X ( x) i=1 = n i=1 x i 2 2 x i θ + θ 2 n 2 2 log 2π + log p Θ (θ) log f X ( x)

Sending bits: MAP estimator θ MAP ( x) = 1 if log p Θ X (1 x) + log f X ( x) = n i=1 n i=1 x i 2 2 x i + 1 n log 2π log 4 2 2 x i 2 2 n log 2π log 4 + log 3 2 = log p Θ X (0 x) + log f X ( x). Equivalently, θ MAP ( x) = { 1 if 1 n n i=1 x i > 1 2 + log 3 n, 0 otherwise.

Sending bits: MAP estimator The probability of error is ( )) P (Θ θ MAP X

Sending bits: MAP estimator The probability of error is ( )) P (Θ θ MAP X ( ) ) ( ) ) = P (Θ θ MAP X Θ = 0 P (Θ = 0) + P (Θ θ MAP X Θ = 1 P (Θ = 1)

Sending bits: MAP estimator The probability of error is ( )) P (Θ θ MAP X ( ) (Θ θ MAP X = P ( 1 n = P n i=1 ( 1 + P n ) Θ = 0 X i > 1 2 + log 3 n n i=1 X i < 1 2 + log 3 n ( ) P (Θ = 0) + P (Θ θ MAP X ) Θ = 0 P (Θ = 0) ) Θ = 1 P (Θ = 1) ) Θ = 1 P (Θ = 1)

Sending bits: MAP estimator The probability of error is ( )) P (Θ θ MAP X ( ) (Θ θ MAP X = P ( 1 n = P n i=1 ( 1 + P n ) Θ = 0 X i > 1 2 + log 3 n n i=1 X i < 1 2 + log 3 n ( ) P (Θ = 0) + P (Θ θ MAP X ) Θ = 0 P (Θ = 0) ) Θ = 1 P (Θ = 1) = 3 ( ) n/2 4 Q log 3 + + 1 ( ) n/2 n 4 Q log 3 n ) Θ = 1 P (Θ = 1)

Sending bits: Probability of error 0.35 0.30 ML estimator MAP estimator Probability of error 0.25 0.20 0.15 0.10 0.05 0.00 0 5 10 15 20 n