Bayes Rule and its Applications

Σχετικά έγγραφα
ENGR 691/692 Section 66 (Fall 06): Machine Learning Assigned: August 30 Homework 1: Bayesian Decision Theory (solutions) Due: September 13

Other Test Constructions: Likelihood Ratio & Bayes Tests

Solution Series 9. i=1 x i and i=1 x i.

ST5224: Advanced Statistical Theory II

2 Composition. Invertible Mappings

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review

Solutions to Exercise Sheet 5

Homework 8 Model Solution Section

Areas and Lengths in Polar Coordinates

b. Use the parametrization from (a) to compute the area of S a as S a ds. Be sure to substitute for ds!

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Parametrized Surfaces

Probability and Random Processes (Part II)

Statistical Inference I Locally most powerful tests

5.4 The Poisson Distribution.

P AND P. P : actual probability. P : risk neutral probability. Realtionship: mutual absolute continuity P P. For example:

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

Areas and Lengths in Polar Coordinates

Exercises to Statistics of Material Fatigue No. 5

Numerical Analysis FMN011

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

1. (a) (5 points) Find the unit tangent and unit normal vectors T and N to the curve. r(t) = 3cost, 4t, 3sint

Problem Set 3: Solutions

Reminders: linear functions

6. MAXIMUM LIKELIHOOD ESTIMATION

k A = [k, k]( )[a 1, a 2 ] = [ka 1,ka 2 ] 4For the division of two intervals of confidence in R +

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

= λ 1 1 e. = λ 1 =12. has the properties e 1. e 3,V(Y

Written Examination. Antennas and Propagation (AA ) April 26, 2017.

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

Answer sheet: Third Midterm for Math 2339

Example Sheet 3 Solutions

Spherical Coordinates

C.S. 430 Assignment 6, Sample Solutions

Homework for 1/27 Due 2/5

4.6 Autoregressive Moving Average Model ARMA(1,1)

New bounds for spherical two-distance sets and equiangular lines

6.3 Forecasting ARMA processes

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

CHAPTER 101 FOURIER SERIES FOR PERIODIC FUNCTIONS OF PERIOD

Matrices and Determinants

Chapter 5, 6 Multiple Random Variables ENCS Probability and Stochastic Processes

HOMEWORK#1. t E(x) = 1 λ = (b) Find the median lifetime of a randomly selected light bulb. Answer:

derivation of the Laplacian from rectangular to spherical coordinates

Mock Exam 7. 1 Hong Kong Educational Publishing Company. Section A 1. Reference: HKDSE Math M Q2 (a) (1 + kx) n 1M + 1A = (1) =

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

Fractional Colorings and Zykov Products of graphs

Homework 3 Solutions

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

Dynamic types, Lambda calculus machines Section and Practice Problems Apr 21 22, 2016

An Inventory of Continuous Distributions

Aquinas College. Edexcel Mathematical formulae and statistics tables DO NOT WRITE ON THIS BOOKLET

Second Order Partial Differential Equations

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ.

w o = R 1 p. (1) R = p =. = 1

Section 8.3 Trigonometric Equations

Partial Differential Equations in Biology The boundary element method. March 26, 2013

Every set of first-order formulas is equivalent to an independent set

Μηχανική Μάθηση Hypothesis Testing

Overview. Transition Semantics. Configurations and the transition relation. Executions and computation

EE512: Error Control Coding

Math221: HW# 1 solutions

A Two-Sided Laplace Inversion Algorithm with Computable Error Bounds and Its Applications in Financial Engineering

The Simply Typed Lambda Calculus

A Note on Intuitionistic Fuzzy. Equivalence Relation

An Introduction to Signal Detection and Estimation - Second Edition Chapter II: Selected Solutions

Homomorphism in Intuitionistic Fuzzy Automata

Lecture 21: Properties and robustness of LSE

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

Jesse Maassen and Mark Lundstrom Purdue University November 25, 2013

Concrete Mathematics Exercises from 30 September 2016

Uniform Convergence of Fourier Series Michael Taylor

Inverse trigonometric functions & General Solution of Trigonometric Equations

1η εργασία για το μάθημα «Αναγνώριση προτύπων»

1 String with massive end-points

Fourier Series. MATH 211, Calculus II. J. Robert Buchanan. Spring Department of Mathematics

D Alembert s Solution to the Wave Equation

g-selberg integrals MV Conjecture An A 2 Selberg integral Summary Long Live the King Ole Warnaar Department of Mathematics Long Live the King

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 24/3/2007

forms This gives Remark 1. How to remember the above formulas: Substituting these into the equation we obtain with

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

Congruence Classes of Invertible Matrices of Order 3 over F 2

CRASH COURSE IN PRECALCULUS

5. Choice under Uncertainty

Theorem 8 Let φ be the most powerful size α test of H

Lecture 13 - Root Space Decomposition II

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

The challenges of non-stable predicates

Θεωρία Πληροφορίας και Κωδίκων

Lecture 7: Overdispersion in Poisson regression

Commutative Monoids in Intuitionistic Fuzzy Sets

Differentiation exercise show differential equation

The Pohozaev identity for the fractional Laplacian

Appendix to On the stability of a compressible axisymmetric rotating flow in a pipe. By Z. Rusak & J. H. Lee

Notes on the Open Economy

ORDINAL ARITHMETIC JULIAN J. SCHLÖDER

Bessel functions. ν + 1 ; 1 = 0 for k = 0, 1, 2,..., n 1. Γ( n + k + 1) = ( 1) n J n (z). Γ(n + k + 1) k!

Transcript:

Bayes Rule and its Applications Bayes Rule: P (B k A) = P (A B k )P (B k )/ n i= P (A B i )P (B i ) Example : In a certain factory, machines A, B, and C are all producing springs of the same length. Of their production, machines A, B, and C produce %, %, and 3% defective springs, respectively. Of the total production of springs in the factory, machine A produces 35%, machine B produces 5%, and machine C produces 4%. Then we have P (D A) =., P (A) =.35; P (D B) =., P (B) =.5; P (D C) =.3, P (C) =.4. If one spring is selected at random from the total springs produced in a day, the probability that it is defective equals P (D) = X {A,B,C} P (D X)P (X) = 5/ If the selected spring is defective, the conditional probability that it was produced by machine A, B, or C can be calculated by P (A D) = P (D A)P (A)/P (D) = 7/5 P (B D) = P (D B)P (B)/P (D) = 5/5 P (C D) = P (D C)P (C)/P (D) = /5

Foundation for Normal Distributions Γ(α) = e t t α dt, for α > Γ(α + ) = αγ(α) Γ() = Γ(n + ) = n! integer n where!. e x dx = π Γ( ) = π X N(µ, σ ) means that f X (x) = πσ e (x µ) /σ, < x < Sampling X N(µ, σ ) by Matlab Y = random( Normal, µ, σ, SampleSize, )

3 Expectation and Covariance Matrix Let X, X,, X n be random variables such that the expectation, variance, and covariance are defined as follows. µ j = E[X j ], σ j = V ar(x j ) = E[(X j µ j ) ] () ρ ij σ i σ j = Cov(X i, X j ) = E[(X i µ i )(X j µ j )] () Suppose X = [X, X,, X n ] t be a random vector, then the expectation and covariance matrix of X is defined as E[X] = [µ, µ,, µ n ] t = µ (3) Cov(X) = [E[(X i µ i )(X j µ j )]] (4)

4 Bayes Decision Theory () p(ω i ): a priori probability () p(x ω i ): class conditional density function (3) p(ω i x): a posteriori probability (4) α(x): an action (a decision) (5) λ(α(x) ω j ): the loss function p(error) = x p(error x)p(x) = x p(α(x) ω i, x ω j, i j)p(x) R(α(x) x) = C j= λ(α(x) ω j )p(ω j x): conditional risk for pattern x x R(α(x) x)p(x): average error of probability (error rate) Bayes Decision Rule For each x, find α(x) which minimizes R(α(x) x) { if α(x) = ωj, For the - loss function, i.e. λ(α(x) ω j ) = otherwise Then the Bayes decision rule can be reduced to or min [R(α(x) x)] = min j [ p(ω j x)] = max i p(ω i x) Assign x to class ω i if p(ω i x) > p(ω j x) for j i

5 Example : X ω i N(µ i, σ i ) p(x ω i ) = πσi exp[ (x µ i ) /σ i ] The Bayes decision rule is to assign x to class ω i if p(ω i x) > p(ω j x) for j i iff iff p(x ω i )p(ω i ) > p(x ω j )p(ω j ) ln[σ j /σi ] + (x µ σj j ) (x µ σi i ) > ln[p(ω j )/p(ω i )] In particular, (a) if σ i = σ for each i, then the Bayes rule is to assign x to class ω i if (µ i µ j )x (µ i µ j ) > σ ln[p(ω j )/p(ω i )] (b) if σ i = σ and p(ω i ) = /C for each i, then the Bayes rule is to assign x to class ω i if (µ i µ j )[x (µ i + µ j )] >

6 Example 3: X ω i N(µ i, C i ) p(x ω i ) = π d/ C i / exp[ (x µ i) t C i (x µ i )/] The Bayes decision rule is to assign x to class ω i if iff [ ln( Cj / C i ) + ( x t C j In particular, p(ω i x) > p(ω j x) for j i x x t C j ) ( µ j + µ t j C µ j x t C x x t C > ln[p(ω j )/p(ω i )] (a) if C i = C for each i, then the Bayes decision rule is to assign x to class ω i if j (µ i µ j ) t C x [µt i C µ i µ t j C µ j ] > ln[p(ω j )/p(ω i )] i i )] µ i + µ t i C i µ i (b) if C i = σ I for each i, then the Bayes decision rule is to assign x to class ω i if (µ i µ j ) t x ( µ i µ j ) > σ ln[p(ω j )/p(ω i )]

7 Example 4: X ω i N(µ i, C), i =, Let Λ(x) = ln[p(x ω )/p(x ω )] = [x (µ + µ )] t C (µ µ ), and let r = ln[p(ω )/p(ω )]. The Bayes decision rule is to assign x to class ω if Λ(x) > r. Note that x ω Λ(X) N(, ) x ω Λ(X) N(, ) where = (µ µ ) t C (µ µ ) is called the square Mahalanobis distance. Hint: Prove that Λ(x)p(x ω )dx = and [Λ(x) ] p(x ω )dx = Then the Bayes error rate can be computed by E = p[x ω, Λ(x) > r] + p[x ω, Λ(x) < r] = p(ω ) r π exp[ (y + ) / ]dy + p(ω ) r π exp[ (y ) / ]dy

8 Proof of Λ(X) ω N(, ) X is a multivariate normal distribution so is its linear mapping Λ(X). E[Λ(X) ω ] = Λ(x)p(x ω )dx = {[x (µ + µ )] t C (µ µ )}p(x ω )dx = x t C (µ µ )p(x ω )dx (µ + µ ) t C (µ µ )p(x ω )dx = [x t p(x ω )dx]c (µ µ ) [(µ + µ ) t C (µ µ )] p(x ω )dx = µ t C (µ µ ) (µ + µ ) t C (µ µ ) = (µ µ ) t C (µ µ ) = V ar[λ(x) ω ] = [Λ(x) ] p(x ω )dx = [(x µ ) t C (µ µ )] p(x ω )dx = (µ µ ) t C { [(x µ )(x µ )] t p(x ω )dx}c (µ µ ) = (µ µ ) t C CC (µ µ ) = (µ µ ) t C (µ µ ) =

9 In Example 4, let µ = [ ], µ = [ ], C = I, and let p(ω ) = p(ω ) = / Then Λ(x) = (x ) t I [ ] = (x + x ) r = ln [p(ω )/p(ω )] = = (µ µ ) t I (µ µ ) = 8 The Bayes decision rule is to assign x to ω if x + x > The Bayes error rate is p(error) = Φ( ) = Φ( 8 ) = Φ( )

Example 5: Bayes decision theory in a discrete case Consider a two-class problem and assume that each component x j of the pattern x is either or with the conditional probabilities p j = p(x j = ω ) and q j = p(x j = ω ) (5) Suppose that the components of x are conditionally independent. Then p(x ω ) = d j= p x j j ( p j ) x j, p(x ω ) = d j= q x j j ( q j ) x j (6) Let the log-likelihood ratio Λ(x) = ln[p(x ω )/p(x ω )], and r = ln[p(ω )/p(ω )], then d d Λ(x) = x j ln[p j ( q j )/q j ( p j )] + ln[( p j )/( q j )] (7) j= j= The Bayes decision rule is to assign x to ω if Λ(x) > r. The decision boundary is d j= w j x j + w =, where w j = ln[p j ( q j )/q j ( p j )], j d, w = d j= ln[( p j )/( q j )] + ln[p(ω )/p(ω )] In particular, if p(ω ) = p(ω ) = /, p j = p, q j = q = p for j d, d is odd and p > q, then the Bayes error rate is E = (d )/ k= d! (d k)!k! pk ( p) d k (8)

X ω N(3, ) and X ω N(6, ) p(x ω i ) = exp[ (x µ i ) /σi ], i =, πσi The Maximum Likelihood (ML) decision is to assign x to ω if p(x ω ) > p(x ω ) The Bayes decision is to assign x to ω if p(x ω )p(ω ) > p(x ω )p(ω ) Note that ML is the special case by assuming p(ω ) = p(ω ) = which need not be true in practical applications. We shall show the effect of p(ω ) =, p(ω 3 ) =. 3 ML Decision: y ω if y < 7 Bayes Decision: y ω if y < 5 or y > 9 The error probability can be computed by Err = p(ω ) b a 4 + (8ln)/3 or y > 7 + 4 + (8ln)/3 π exp[ (x 3) /( )]dx + p(ω ){ a π exp[ (x 6) /]dx + b π exp[ (x 6) /]dx} where a=4.587, b=9.483 for ML decision, and a=5, b=9 for Bayes decision, then Err ML =.65 and Err Bayes =.53.

X ω Rayleigh( ) and X ω Rayleigh(3 ) p(x ω i ) = x σ i exp( x ), i =, σi The Maximum Likelihood (ML) decision is to assign x to ω if p(x ω ) > p(x ω ) The Bayes decision is to assign x to ω if p(x ω )p(ω ) > p(x ω )p(ω ) Note that ML is the special case by assuming p(ω ) = p(ω ) = which need not be true in practical applications. We shall show the effect of p(ω ) = 3, p(ω 4 ) = with σ 4 < σ. ML Decision: y ω if y < Bayes Decision: y ω if y < σ σ σ σ σ σ σ σ The error probability can be computed by Err = p(ω ) t ln( σ σ ) [ln( σ ) + ln( p(ω ) )] σ p(ω ) x exp( x )dx σ σ + p(ω ) t x exp( x )dx σ σ = p(ω ) exp( t ) + p(ω σ ) [ exp( t )] σ where t=.35 for ML decision, and t=.73 for Bayes decision, then Err ML =.34 and Err Bayes =.8.

3 X ω χ (r ) and X ω χ (r ) p(x ω i ) = Γ(r i /) r i/ x(r i/) e x/, x >, i =, The Maximum Likelihood (ML) decision is to assign x to ω if p(x ω ) > p(x ω ) The Bayes decision is to assign x to ω if p(x ω )p(ω ) > p(x ω )p(ω ) Note that ML is the special case by assuming p(ω ) = p(ω ) = which need not be true in practical applications. We shall show the effect of p(ω ) =, p(ω 7 ) = 6 with r 7 < r. ML Decision: Bayes Decision: y ω if Γ(r /) Γ(r /) (r r )/ > x (r r )/ y ω if Γ(r /) Γ(r /) (r r )/ p(ω ) p(ω ) > x(r r )/ The error probability can be computed by Err = p(ω ) t +p(ω ) t Γ(r x (r/) e x/ dx /) r / Γ(r x (r/) e x/ dx /) r / When r = 4, r = 8, p(ω ) = /7, p(ω ) = 6/7, we have t=4.899 for ML decision, and t=. for Bayes decision, and Err ML =.4 and Err Bayes =.4.

4 X ω N(µ, σ ) and X ω N(µ, σ ) p(x ω i ) = exp[ (x µ i ) /σi ], i =, πσi The Maximum Likelihood (ML) decision is to assign x to ω if p(x ω ) > p(x ω ) The Bayes decision is to assign x to ω if p(x ω )p(ω ) > p(x ω )p(ω ) Note that ML is the special case by assuming p(ω ) = p(ω ) = in practical applications. which need not be true The error probability can be computed by Err = p(ω ) T + p(ω ) T πσ exp[ (x µ ) /(σ )]dx πσ exp[ (x µ ) /(σ )]dx As soon as T is chosen, Parameters p(ω i ), µ i, σi, i =, could be estimated, so is Err. Otsu (979) and Fisher (936) chose T to maximize the following criterion, respectively σ B = p(ω )p(ω )(µ µ ) F isher = (µ µ ) σ +σ

5 A Simple Thresholding Algorithm By Otsu, IEEE Trans. on SMC, 6-66, 979 () p i n i n, where n = G i= n i () u T = G k= kp k (3) Do for k =, G ω(k) u(k) k i= p i k i= ip i σ B (k) [u T ω(k) u(k)] ω(k)[ ω(k)] (4) Select k such that σb (k) is maximized Note that ω = ω(k), u = k i= ip (i C ) = u(k)/ω(k) ω = ω(k), u = G i=k+ ip (i C ) = [u T u(k)]/[ ω(k)] σ ω = ω σ + ω σ σ B = ω (u u T ) + ω (u u T ) σ T = G i= (i u T ) p i σ ω + σ B = σ T ɛ = σ B σ T, κ = σ T, λ = σ σω B σ ω

6 Gamma Function and the Volumes of High Dimensional Spheres. Define Γ(x) = e t t x dt for x > and let γ = e x dx. Then (a) Γ(x) = e x dx = π (b) Show that Γ( ) = γ = π (c) Γ(x + ) = xγ(x), for x >, Γ(n) = (n )! if n N. (d) The volume of a d dimensional unit sphere is π d/ /Γ( d + ). 6 The volume of a d dim unite sphere is π d/ /Γ((d/)+) 5 4 3 5 5 Figure : The Volume of a High Dimensional Sphere. V()=; V()=pi; V(3)=4*pi/3; V(4)=pi*pi/; V(5)=8*pi*pi/5; V(6)=pi*pi*pi/6; V(7)=6*pi*pi*pi/5; V(8)=(pi)^4/4; V(9)=3*(pi)^4/945; V()=(pi)^5/; V()=64*(pi)^5/395; V()=(pi)^6/7; V(3)=8*(pi)^6/3535; V(4)=(pi)^7/54; V(5)=56*(pi)^7/75; D=:5; plot(d,v, b-v ) title( The volume of a d-dim unit sphere is \pi^{d/}/\gamma((d/)+) )

7 The Derivation of Volumn for an n-dimensional Sphere For n =, x = r cos θ, x = r sin θ, θ π r R, J = (x, x ) (r, θ) = x r x θ x r x θ = r The volumn is computed by V = R π J drdθ = R π rdrdθ = πr For n = 3, x = r cos θ cos θ, θ π x = r cos θ sin θ, π θ π x 3 = r sin θ, r R, J 3 = (x, x, x 3 ) (r, θ, θ ) = x r x r x 3 r x x x 3 θ θ θ x x x 3 θ θ θ = r cos θ The volumn is computed by V 3 = R π/ π π/ J 3 drdθ dθ = R π/ π π/ r cos θ drdθ dθ = 4πR3 3

8 For n 4, x = r cos θ cos θ cos θ n cos θ n, θ n π x = r cos θ cos θ cos θ n sin θ n, π θ n π x 3 = r cos θ cos θ cos θ n 3 sin θ n, π θ n 3 π.. x j = r cos θ cos θ cos θ n j sin θ n j+, π θ n j π.. x n = r cos θ sin θ, π θ π x n = r sin θ, r R n Note that x i = r and denote c i = cos θ i, i= Jacobian J n = (x,x,,x n) (r,θ,,θ n is computed as ) s i = sin θ i for i n. Then the c c c n c n c c c n s n c c c n 3 s n s s c c n c n s c c n s n s c c n 3 s n c c s c n c n c s c n s n c s c n 3 s n J n = r n..... c c s n c n c c s n s n c c c n 3 c n c c c n s n c c c n c n

9 Then J n = r n c n c n c n s s s n t t t t t t t.. t n t n t n t n t n..., where t i = sinθ i cos θ i Subtracting each column from the preceding one and do further simplifications, we obtain J n = r n c n c n c n s s s n Π n j= (t j + t j ) = r n c n c n c n s s s n Π n j= ( s j c j ) = r n c n c n 3 c n 3 c n = r n cos n θ cos n 3 θ cos θ n 3 cos θ n

Therefore, the volumn of an n-dimensional sphere can be calculated by V n = R π π/ π/ π/ π/ π/ π/ (J n )dθ dθ dθ n dθ n dr = R π π/ π/ π/ π/ [r n cos n θ cos n 3 θ cos θ n ]dθ dθ dθ n dθ n dr π/ π/ = πn/ Γ( n Rn +) Note that the above computations exploit the properties of the following Gamma and Beta functions, and trignometry. Γ(α) = Beta(α, β) = e t t α dt for α > x α ( x) β dx, where α, β > Beta(α, β) = Γ(α) Γ(β) Γ(α+β) Relationship Between Gamma and Beta Functions Γ(x)Γ(y) = e u u x du e v v y dv = = = z= z= t= e u v u x v y dudv e z (zt) x [z( t)] y zdtdz by putting u = zt, v = z( t) e z z x+y dz t= t x ( t) y dt = Γ(x + y)beta(x, y) Γ(α) = (α )Γ(α ) for α > π dθ n = π, Γ() =, π/ π/ Γ( ) = π cos θ n dθ n =, π/ π/ cos θ n 3 dθ n 3 = π, π/ π/ cos m θdθ = π/ cos m θdθ, 3 α n

Now π/ cos m θdθ = x m dx by letting x = cos θ x = = x m ( x ) / dx = y m ( y) / dy = y m/ ( y) / y dy, where x = y y m+ ( y) dy Therefore, = Beta( m+, m+ Γ( )Γ( ) = ) Γ( m +) V n = R π π/ π/ π/ π/ [r n cos n θ cos n 3 θ cos θ n ]dθ dθ dθ n dθ n dr π/ π/ = Rn n (π) () ( π) Πn m=3 Beta(m +, ) = π n/ Rn + ) Γ( n