ENGR 691/692 Section 66 (Fall 06): Machine Learning Assigned: August 30 Homework 1: Bayesian Decision Theory (solutions) Due: September 13

ENGR 69/69 Section 66 (Fall 06): Machine Learning Assigned: August 30 Homework : Bayesian Decision Theory (solutions) Due: Septemer 3 Prolem : ( pts) Let the conditional densities for a two-category one-dimensional prolem e given y the following Cauchy distriution: p(x ω i ) π + ( x a i ), i, (6 pts) By explicit integration, check that the distriution are indeed normalized (9 pts) Assuming P (ω )P (ω ), show that P (ω x) P (ω x) ifx a+a, that is, the minimum error decision oundary is a point midway etween the peaks of the two distriutions, regardless of 3 (7 pts) Show that the minimum proaility of error is given y P (error) π tan a a We sustitute y x ai u p(x ω i )dx π into the aove and get k π + ( x a i ) dx +y dy By setting p(x ω )P (ω )p(x ω )P (ω ), we have π + ( x a π tan (y) ( π π + π ) ) π + ( x a ), or, equivalently, x a ±(x a ) For a a,thisimpliesthatx a+a 3 Without loss of generality, we assume a >a The proaility of error is defined as P (error) P (error,x)dx P (error x)p(x)dx Note that the decision oundary is at a+a, hence { P (ω x) if x a+a P (error x) P (ω x) if x> a+a { p(x ω)p (ω ) p(x) if x a+a p(x ω )P (ω ) p(x) if x> a+a

Therefore, the proaility of error is We sustitute y x a P (error) a +a π and z x a a +a [ P (error) a a π [ π π p(x ω )P (ω )dx + p(x ω )P (ω )dx a +a +( x a ) dx + π a +a +( x a ) dx into the aove and get ] +y dy + a a +z dz tan a a ] (y) +tan (z) a a ( tan a a + π + π a tan a π tan a a Similarly, if a >a,wehavep (error) π tan a a P (error) π tan a a ) Therefore, we have shown that Prolem : ( pts) Let ω max (x) e the state of nature for which P (ω max x) P (ω i x) for all i, i,,c (7 pts) Show that P (ω max x) c (7 pts) Show that for the minimum-error-rate decision rule the average proaility of error is given y P (error) P (ω max x)p(x)dx 3 (7 pts) Show that P (error) c c Since P (ω max x) P (ω i x), we have Hence which implies that P (ω max x) c P (ω max x) P (ω i x) cp (ω max x), By definition, P (error) P (error x)p(x)dx [ P (ω max x)] p(x)dx P (ω max x)p(x)dx

3 3 From and, it is clear that P (error) P (ω max x)p(x)dx c p(x)dx c c c Prolem 3: ( pts) In many machine learning applications, one has the option either to assign the pattern to one of c classes, or to reject it as eing unrecognizale If the cost for rejects is not too high, rejection may e a desirale action Let 0 i j i,j,,c λ(α i ω j ) λ r i c + λ s otherwise, where λ r is the loss incurred for choosing the (c + )th action, rejection, and λ s is the loss incurred for making any sustitution error (0 pts) Please derive the decision rule with the minimum risk (6 pts) What happens if λ r 0? 3 (6 pts) What happens if λ r >λ s? For i,,c, R(α i x) λ(α i ω j )P (ω j x) j λ s j,j i P (ω j x) λ s [ P (ω i x)] For i c +, R(α c+ x) λ r Therefore, the minimum risk is achieved if we decide ω i if R(α i x) R(α c+ x), ie, P (ω i x) λr λ s,and reject otherwise If λ r 0, we always reject 3 If λ r >λ s, we will never reject Prolem 4: ( pts + 0 extra points) Let the components of the vector x [x,,x d ] T e inary-valued (0 or ), and let P (ω j ) e the prior proaility for the state of nature ω j and j,,c We define p ij P (x i ω j ),,,d,j,,c, with the components of x i eing statistically independent for all x in ω j ( pts) Show that the minimum proaility of error is achieved y the following decision rule: Decide ω k if g k (x) g j (x) for all j and k, where g j (x) x i ln p ij + p ij ln( p ij )+lnp(ω j ) (0 extra pts) If the components of x are ternary valued (, 0, or ), show that a minimum proaility of error decision rule can e derived that involves discriminant functions g j (x) that are quadratic function of the components x i

4 Consider the following discriminant function g j (x) ln[p(x ω j )P (ω j )] ln p(x ω j )+lnp (ω j ) The components of x are statistically independent for all x in ω j, then we can write the density as a product: p(x ω j ) p(x i ω j ) p xi ij ( p ij) xi Thus we have the discriminant function g j (x) [x i ln p ij +( x i )ln( p ij )] + ln P (ω j ) p ij x i ln + p ij Consider the following discriminant function ln( p ij )+lnp(ω j ) g j (x) ln[p(x ω j )P (ω j )] ln p(x ω j )+lnp (ω j ) The components of x are statistically independent for all x in ω j, therefore, Let It is not hard to check that p(x i ω j ) p(x ω j ) p(x i ω j ) p ij P (x i ω j ), q ij P (x i 0 ω j ), r ij P (x i ω j ) Thus the discriminant functions can e written as [( g j (x) x i + x i x i ln pij r ij q ij + p xi+ x i ij ) ln p ij +( x i )lnq ij + x i ln p ij r ij + q x i ij r xi+ x i ij ( x i + ) ] x i ln r ij +lnp (ω j ) ln q ij +lnp (ω j ) which are quadratic functions of the components x i Question 5: (3 pts) Suppose we have three categories with prior proailities P (ω )05, P (ω )P(ω 3 ) 05 and the class conditional proaility distriutions p(x ω ) N(0, ) p(x ω ) N(05, ) p(x ω 3 ) N(, )

5 where N(µ, σ ) represents the normal distriution with density function p(x) e (x µ) σ πσ We sample the following sequence of four points: x 06, 0, 09, (9 pts) Calculate explicitly the proaility that the sequence actually came from ω, ω 3, ω 3, ω (6 pts) Repeat for the sequence ω, ω, ω, ω 3 3 (8 pts) Find the sequence of states having the maximum proaility It is straightforward to compute that p(06 ω )03335 p(06 ω )0396953 p(06 ω 3 )036870 p(0 ω )0396953 p(0 ω )036870 p(0 ω 3 )066085 p(09 ω )066085 p(09 ω )036870 p(09 ω 3 )0396953 p( ω )0785 p( ω )03335 p( ω 3 )0396953 We denote X (x,x,x 3,x 4 )andω (ω(),ω(),ω(3),ω(4)) Clearly, there are 3 4 possile values of ω, such as (ω,ω,ω,ω ) (ω,ω,ω,ω ) (ω,ω,ω,ω 3 ) (ω,ω,ω,ω ) (ω,ω,ω,ω ) (ω,ω,ω,ω 3 ) (ω,ω 3,ω,ω ) (ω,ω,ω 3,ω ) (ω,ω,ω 3,ω 3 ) (ω 3,ω 3,ω 3,ω ) (ω 3,ω 3,ω 3,ω ) (ω 3,ω 3,ω 3,ω 3 ) For each possile value of ω, wecalculatep (ω) andp (x ω) using the following, which assume the independences of x i and ω(i): p(x ω) P (ω) 4 p(x i w(i)) 4 P (ω(i)) For example, if ω (ω,ω 3,ω 3,ω )andx (06, 0, 09, ), then we have and p(x ω) p((06, 0, 09, ) (ω,ω 3,ω 3,ω )) p(06 ω )p(0 ω 3 )p(09 ω 3 )p( ω ) 03335 066085 0396953 03335 0073 P (ω) P (ω )P (ω )P (ω 3 )P (ω 4 ) 4 4 4 000785 Given X (06, 0, 09, ) and ω (ω,ω 3,ω 3,ω ), we have p(x) p(x 06,x 0,x 3 09,x 4 ) p(x 06,x 0,x 3 09,x 4 ω)p (ω)!

6 p(x 06,x 0,x 3 09,x 4 ω,ω,ω,ω )P (ω,ω,ω,ω ) +p(x 06,x 0,x 3 09,x 4 ω,ω,ω,ω )P (ω,ω,ω,ω ) +p(x 06,x 0,x 3 09,x 4 ω 3,ω 3,ω 3,ω 3 )P (ω 3,ω 3,ω 3,ω 3 ) p(06 ω )p(0 ω )p(09 ω )p( ω )P (ω )P (ω )P (ω )P (ω ) +p(06 ω )p(0 ω )p(09 ω )p( ω )P (ω )P (ω )P (ω )P (ω ) +p(06 ω 3 )p(0 ω 3 )p(09 ω 3 )p( ω 3 )P (ω 3 )P (ω 3 )P (ω 3 )P (ω 3 ) 00083 Therefore, Following the steps in part, we have P (ω X) P (ω,ω 3,ω 3,ω 06, 09, 0, ) p(06, 09, 0, ω,ω 3,ω 3,ω )P (ω,ω 3,ω 3,ω ) p(x) 0073 000785 00083 0007584 P (ω,ω,ω,ω 3 06, 0, 09, ) p(06, 0, 09, ω,ω,ω,ω 3 )P (ω,ω,ω,ω 3 ) p(x) 00794 000785 00083 0060 3 The sequence ω (ω,ω,ω,ω ) has the maximum proaility to oserve X (06, 0, 09, ) This maximum proaility is 003966