Chapter 5, 6 Multiple Random Variables ENCS Probability and Stochastic Processes

Chapter 5, 6 Multiple Random Variables ENCS6161 - Probability and Stochastic Processes Concordia University

ENCS6161 p.1/47 Vector Random Variables A vector r.v. X is a function X : S R n, where S is the sample space of a random experiment. Example: randomly pick up a student name from a list. S = {all student names on the list}. Let ω be a given outcome, e.g. Tom H(ω) : height of student ω W(ω) : weight of student ω A(ω) : age of student ω Let X = (H,W,A), then X is a vector r.v. H,W,A are r.v.s.

ENCS6161 p.2/47 Events Each event involving X = (X 1,X 2,,X n ) has a corresponding region in R n. Example: X = (X 1,X 2 ) is a two-dimensional r.v. A = {X 1 + X 2 10} B = {min(x 1,X 2 ) 5} C = {X 2 1 + X 2 2 100}

ENCS6161 p.3/47 Pairs of Random Variables Pairs of discrete random variables Joint probability mass function P X,Y (x j,y k ) = P {X = x j Y = yk } = P {X = x j,y = y k } Obviously j k P X,Y (x j,y k ) = 1. Marginal Probability Mass Function P X (x j ) = P {X = x j } = P {X = x j, Y = anything} = Similarly P Y (y k ) = j=1 P X,Y (x j,y k ). k=1 P X,Y (x j, y k )

ENCS6161 p.4/47 Pairs of Random Variables The joint CDF of X and Y (for both discrete and continuous r.v.s) F X,Y (x,y) = P {X x,y y} y (x,y) x

ENCS6161 p.5/47 Pairs of Random Variables Properties of the joint CDF: 1. F X,Y (x 1,y 1 ) F X,Y (x 2,y 2 ), if x 1 x 2,y 1 y 2. 2. F X,Y (,y) = F X,Y (x, ) = 0 3. F X,Y (, ) = 1 4. F X (x) = P {X x} = P {X x,y = anything} = P {X x,y } = F X,Y (X, ) F Y (y) = F X,Y (,y) F X (x),f Y (y) : Marginal cdf

ENCS6161 p.6/47 Pairs of Random Variables The joint pdf of two jointly continuous r.v.s. f X,Y (x,y) = 2 x y F X,Y (x,y) Obviously, and F X,Y (x,y) = f X,Y (x,y)dxdy = 1 x y f X,Y (x,y )dy dx

ENCS6161 p.7/47 Pairs of Random Variables The probability P {a X b,c Y d} = In general, P {(X,Y ) A} = b d a c f X,Y (x,y)dydx f X,Y (x,y)dxdy Example: y 1 A A 1 x 0 0 f X,Y (x,y )dy dx 0 1 x

ENCS6161 p.8/47 Pairs of Random Variables Marginal pdf: f X (x) = d dx F X(x) = d dx F X,Y (x, ) = d ( x ) f X,Y (x,y )dy dx dx = f Y (y) = f X,Y (x,y )dy f X,Y (x,y)dx

ENCS6161 p.9/47 Pairs of Random Variables Example: f X,Y (x, y) = Find F X,Y (x, y) 1) x 0 or y 0, F X,Y (x, y) = 0 2) 0 x 1, and 0 y 1 F X,Y (x, y) = 3) 0 x 1, and y > 1 4) x > 1 and 0 y < 1 5) x > 1 and y > 1 F X,Y (x, y) = 1 if 0 x 1, 0 y 1 0 otherwise. x 0 x 0 y 0 1 0 F X,Y (x, y) = y F X,Y (x, y) = 1 1 dy dx = xy 1 dy dx = x

ENCS6161 p.10/47 Independence P X,Y (x j,y k ) = P X (x j )P Y (y k ), for all x j and y k (discrete r.v.s) or F X,Y (x,y) = F X (x)f Y (y) for all x and y or f X,Y (x,y) = f X (x)f Y (y) for all x and y Example: a) b) f X,Y = f X,Y = { { 1 0 x 1, 0 y 1 0 otherwise 1 0 x 2, 0 y x 0 otherwise

ENCS6161 p.11/47 Conditional Probability If X is discrete, F Y (y x) = P {Y y, X = x} P {X = x} for P {X = x} > 0 f Y (y x) = d dy F Y (y x) If X is continuous, P {X = x} = 0 P {Y y, x < X x + h} F Y (y x) = lim F Y (y x < X x + h) = lim h 0 h 0 P {x < X x + h} y x+h f x X,Y (x, y )dx dy = lim h 0 x+h f x X (x )dx y f X,Y (x, y )dy y h f X,Y (x, y )dy = lim h 0 f X (x) h f Y (y x) = d dy F Y (y x) = f X,Y (x, y) f X (x) = f X (x)

ENCS6161 p.12/47 Conditional Probability If X, Y independent, f Y (y x) = f Y (y) Similarly, So, f X (x y) = f X,Y (x,y) f Y (y) f X,Y (x,y) = f X (x) f Y (y x) = f Y (y) f X (x y) Bayes Rule: f X (x y) = f X(x) f Y (y x) f Y (y)

ENCS6161 p.13/47 Conditional Probability Example: A r.v. X is uniformly selected in [0, 1], and then Y is selected uniformly in [0,x]. Find f Y (y) Solution: f X,Y (x,y) = f X (x)f Y (y x) = 1 1 x = 1 x for 0 x 1, 0 y x and is 0 elsewhere. 0 y 1 f Y (y) = 1 x 1 f X,Y (x,y)dx 1 = dx = ln y y x for 0 y 1 and f Y (y) = 0 elsewhere.

ENCS6161 p.14/47 Conditional Probability Example: x {0, 1} Y R f Y (y x) Detector ˆX Channel X = 0 A (volts), we assume P {X = 0} = P 0 X = 1 +A (volts), P {X = 1} = P 1 = 1 P 0 Decide ˆX = 0, if P {X = 0 y} P {X = 1 y} Decide ˆX = 1, if P {X = 0 y} < P {X = 1 y} This is called the Maximum a posterior probability (MAP) detection.

ENCS6161 p.15/47 Conditional Probability Binary communication over Additive White Gaussian Noise (AWGN) channel f Y (y 0) = 1 e (y+a)2 2σ 2 2πσ f Y (y 1) = 1 e (y A)2 2σ 2 2πσ Apply the MAP detection, we need to find P {X = 0 y} and P {X = 1 y}. Note here X is discrete, Y is continuous.

ENCS6161 p.16/47 Conditional Probability Use the similar approach (considering x < X x + h, and let h 0), we have P {X = 0 y} = P {X = 0}f Y (y 0) f Y (y) P {X = 1 y} = P {X = 1}f Y (y 0) f Y (y) Decide ˆX = 0, if P {X = 0 y} P {X = 1 y} y σ2 2A ln p 0 p 1 Decide ˆX = 1, if P {X = 0 y} < P {X = 1 y} y > σ2 2A ln p 0 p 1 When p 0 = p 1 = 1 2 : Decide ˆX = 0, if y 0 Decide ˆX = 1, if y > 0

ENCS6161 p.17/47 Conditional Probability Prob of error: considering the special case p 0 = p 1 = 1 2 P ε = P 0 P { ˆX = 1 X = 0} + P 1 P { ˆX = 0 X = 1} = P 0 P {Y > 0 X = 0} + P 1 P {Y 0 X = 1} P {Y > 0 X = 0} = 1 ) 2πσ Similarly, 0 2 e (y+a) 2σ 2 dy = Q ( ) A P {Y 0 X = 1} = Q σ ( ) A P ε = Q σ A,P ε σ,p ε ( A σ

ENCS6161 p.18/47 Conditional Expectation The conditional expectation E[Y x] = In discrete case, E[Y x] = y i yf Y (y x)dy y i P Y (y i x) An important fact: E[Y ] = E[E[Y X]] Proof: E[E[Y X]] = = In general: E[Y x]f X (x)dx = yf X,Y (x, y)dydx = E[Y ] E[h(Y )] = E[E[h(Y ) X]] yf Y (y x)f X (x)dydx

ENCS6161 p.19/47 Multiple Random Variables Joint cdf F X1, X n (x 1, x n ) = P[X 1 x 1,,X 1 x n ] Joint pdf n f X1, X n (x 1, x n ) = F X1, X x 1, x n (x 1, x n ) n If discrete, joint pmf P X1, X n (x 1, x n ) = P[X 1 = x 1, X n = x n ] Marginal pdf f Xi (x i ) = f(x 1, x n )dx 1 dx n all x 1 x n except x i

ENCS6161 p.20/47 Independence X 1, X n are independent iff F X1, X n (x 1, x n ) = F X1 (x 1 ) F Xn (x n ) for all x 1,,x n If we use pdf, f X1, X n (x 1, x n ) = f X1 (x 1 ) f Xn (x n ) for all x 1,,x n

ENCS6161 p.21/47 Functions of Several r.v.s One function of several r.v.s Z = g(x 1, X n ) Let R z = {x = (x 1,,x n ) s.t. g(x) z} then F z (z) = P {X R z } = f X1,,X n (x 1,,x n )dx 1 dx n x R z

ENCS6161 p.22/47 Functions of Several r.v.s Example: Z = X + Y, find F z (z) and f z (z) in terms of f X,Y (x,y) y y=z-x x Z = X + Y z Y z X F z (z) = z x f z (z) = d dz F z(z) = f X,Y (x,y)dydx If X and Y are independent f X,Y (x,z x)dx f z (z) = f X (x)f Y (z x)dx

ENCS6161 p.23/47 Functions of Several r.v.s Example: let Z = X/Y. Find the pdf of Z if X and Y are independent and both exponentially distributed with mean one. Can use the similar approach as previous example, but complicated. Fix Y = y, then Z = X/y and f Z (z y) = y f X (yz y). So f Z (z) = = f Z,Y (z,y)dy = y f X (yz y)f Y (y)dy = f Z (z y)f Y (y)dy y f X,Y (yz,y)dy

ENCS6161 p.24/47 Functions of Several r.v.s Since X, Y are indep. exponentially distributed f Z (z) = = 0 y f X (yz)f Y (y)dy = 1 (1 + z) 2 for z > 0 0 ye yz e y dy

ENCS6161 p.25/47 Transformation of Random Vectors Transformation of Random Vectors Z 1 = g 1 (X 1 X n ) Z 2 = g 2 (X 1 X n ) Z n = g n (X 1 X n ) The joint CDF of Z is F Z1 Z n (z 1 z n ) = P {Z 1 z 1,,Z n z n } = f X1 X n (x 1 x n )dx 1 dx n x:g k (x) z k

ENCS6161 p.26/47 pdf of Linear Transformations If Z = AX, where A is a n n invertible matrix. f Z (z) = f Z1 Z n (z 1,,z n ) = f X 1 X n (x 1,,x n ) A x=a 1 z = f X(A 1 z) A A is the absolute [ ] value of the determinant of A. a b e.g if A =, then A = ad bc c d

ENCS6161 p.27/47 pdf of General Transformations Z 1 = g 1 (X), Z 2 = g 2 (X),,Z n = g n (X) where X = (X 1,,X n ) We assume that the set of equations: z 1 = g 1 (x),,z n = g n (x) has a unique solution given by x 1 = h 1 (z),,x n = h n (z) The joint pdf of Z is given by f Z1 Z n (z 1 z n ) = f X 1 X n (h 1 (z),,h n (z)) J(x 1,,x n ) = f X1 X n (h 1 (z),,h n (z)) J(z 1,,z n ) ( ) where J(x 1,,x n ) is called the Jacobian of the transformation.

ENCS6161 p.28/47 pdf of General Transformations The Jacobian of the transformation g 1 g x 1 1 x n J(x 1,,x n ) = det g n x 1 and J(z 1,,z n ) = det h 1 z1 h 1 z n h n z 1 h n z n = g n x n Linear transformation is a special case of ( ) 1 J(x 1 x n ) x=h(z)

ENCS6161 p.29/47 pdf of General Transformations Example: let X and Y be zero-mean unit-variance independent Gaussian r.vs. Find the joint pdf of V and W defined by: { V = (X 2 + Y 2 ) 1 2 W = (X,Y ) = arctan(y/x) W [0, 2π) This is a transformation from Cartesian to Polar coordinates. The inverse transformation is: y y { w v x x x = v cos(w) y = v sin(w)

ENCS6161 p.30/47 pdf of General Transformations The Jacobian [ cos w v sin w J(v,w) = sinw v cosw ] = v cos 2 w + v sin 2 w = v Since X and Y are zero-mean unit-variance independent Gaussian r.v.s, f X,Y (x,y) = 1 e x2 1 2 e y2 1 2 = 2π 2π 2π e x The joint pdf of V,W is then 1 f V,W (v,w) = v 2π e (v for v 0 and 0 w < 2π 2 cos 2 w+v 2 sin 2 w) 2 = v 2 +y 2 2 2π e v 2 2

ENCS6161 p.31/47 pdf of General Transformations The marginal pdf of V and W f V (v) = f V,W (v,w)dw = 2π 0 v 2π e v 2 2 dw = ve v2 2 for v 0. This is called the Rayleigh Distribution. f W (w) = f V,W (v,w)dv = 1 ve v2 1 2 dv = 2π 0 2π for 0 w < 2π Since f V,W (v,w) = f V (v)f W (w) V,W are independent.

ENCS6161 p.32/47 Expected Value of Functions of r.v.s Let Z = g(x 1,X 2,,X n ) then E[Z] = For discrete case, E[Z] = all possible x Example: Z = X 1 + X 2 + + X n E[Z] = E[X 1 + X 2 + + X n ] = = E[X 1 ] + + E[X n ] g(x 1,,x n )f X1 X n (x 1,,x n )dx 1 dx n g(x 1,,x n )P X1 X n (x 1,,x n ) (x 1 + + x n )f X1 X n (x 1 x n )dx 1 dx n

ENCS6161 p.33/47 Expected Value of Functions of r.v.s Example: Z = X 1 X 2 X n E[Z] = x 1 x n f X1 X n (x 1 x n )dx 1 dx n If X 1,X 2,,X n are indep. E[Z] = E[X 1 X 2 X n ] = E[X 1 ]E[X 2 ] E[X n ] The (j,k)-th moment of two r.v.s X&Y is E[X j Y k ] = x j y k f X,Y (x,y)dxdy If j = k = 1, it is called the correlation. E[XY ] = xyf X,Y (x,y)dxdy If E[XY ] = 0, we call X&Y are orthogonal.

ENCS6161 p.34/47 Expected Value of Functions of r.v.s The (j,k)-th central moment of X,Y is E [(X E (X)) j (Y E (Y )) k] when j = 2, k = 0, V ar(x) j = 0, k = 2, V ar(y ) When j = k = 1, it is called the covariance of X,Y Cov(X,Y ) = E[(X E(X))(Y E(Y ))] = E[XY ] E[X]E[Y ] = Cov(Y,X) The correlation coefficient of X and Y is defined as ρ X,Y = Cov(X,Y ) σ X σ Y where σ X = V ar(x) and σ Y = V ar(y ).

ENCS6161 p.35/47 Expected Value of Functions of r.v.s The correlation coefficient 1 ρ X,Y 1 Proof: { (X E[X] 0 E ± Y E[Y ] ) } 2 σ X σ Y = 1 ± 2ρ X,Y + 1 = 2(1 ± ρ X,Y ) If ρ X,Y = 0, X,Y are said to be uncorrelated. If X,Y are independent, E[XY ] = E[X]E[Y ] Cov(X,Y ) = 0 ρ X,Y = 0. Hence, X, Y are uncorrelated. The converse is not always true. It is true in the case of Gaussian r.v.s ( will be discussed later)

ENCS6161 p.36/47 Expected Value of Functions of r.v.s Example: θ is uniform in [0, 2π). Let X = cosθ and Y = sinθ X and Y are not independent, since X 2 + Y 2 = 1. However E[XY ] = E[sin θ cos θ] = E[ 1 2 sin(2θ)] = 2π 0 1 2π 1 2 sin(2θ)dθ = 0 We can also show E[X] = E[Y ] = 0. So Cov(X,Y ) = E[XY ] E[X]E[Y ] = 0 ρ X,Y = 0 X, Y are uncorrelated but not independent.

ENCS6161 p.37/47 Joint Characteristic Function Joint Characteristic Function Φ X1 X n (w 1,,w n ) = E For two variables Φ X,Y (w 1,w 2 ) = E [e j(w 1X 1 + +w n X n ) ] [e j(w 1X+w 2 Y ) ] Marginal characteristic function Φ X (w) = Φ X,Y (w, 0) Φ Y (w) = Φ X,Y (0,w) If X,Y are independent Φ X,Y (w 1,w 2 ) = E[e jw 1X+jw 2 Y ] = E[e jw 1X ]E[e jw 2Y ] = Φ X (w 1 )Φ Y (w 2 )

ENCS6161 p.38/47 Joint Characteristic Function If Z = ax + by Φ Z (w) = E[e jw(ax+by ) ] = Φ X,Y (aw,bw) If Z = X + Y, X and Y are independent Φ Z (w) = Φ X,Y (w,w) = Φ X (w)φ Y (w)

ENCS6161 p.39/47 Jointly Gaussian Random Variables Consider a vector of random variables X = (X 1,X 2, X n ). Each with mean m i = E[X i ], for i = 1,,n and the covariance matrix V ar(x 1 ) Cov(X 1,X 2 ) Cov(X 1,X n ) K =... Cov(X n,x 1 ) V ar(x n ) Let m = [m 1,,m n ] T be the mean vector and x = [x 1,,x n ] T where ( ) T denotes transpose. Then X 1,X 2, X n are said to be the jointly Gaussian if their joint pdf is: { 1 f X (x) = exp 1 } (2π) n 2 K 1 2 2 (x m)t K 1 (x m) where K is the determinant of K.

ENCS6161 p.40/47 Jointly Gaussian Random Variables For n = 1, let V ar(x 1 ) = σ 2, then f X (x) = 1 2πσ e (x m)2 2σ 2 For n = 2, denote the r.v.s by X and Y. Let m = m X m Y K = σ 2 X ρ XY σ X σ Y Then K = σ 2 X σ2 Y (1 ρ2 XY ) and K 1 = f X,Y (x, y) = 1 σ 2 X σ2 Y (1 ρ2 XY ) ρ XY σ X σ Y σ 2 Y σ 2 Y ρ XY σ X σ Y ρ XY σ X σ Y σ 2 X { 1 1 exp 2πσ X σ Y 1 ρ 2 XY 2(1 ρ 2 XY ) 2ρ XY ( x m x σ x )( y m y σ y ) + ( y m y σ y ) 2 [ ( x m x ]} σ x ) 2

ENCS6161 p.41/47 Jointly Gaussian Random Variables The marginal pdf of X: f X (x) = The marginal pdf of Y : f Y (y) = 1 e (x m x) 2σ x 2 2πσx 1 e (y my) 2σy 2 2πσy 2 2 If ρ XY = 0 X, Y are independent. The conditional pdf. f X Y (x y) = f X,Y (x, y) f Y (y) = exp { [ ] } 2 1 σ x ρ X 2(1 ρ 2 XY XY )σ2 σy (y m y ) m x X 2πσ 2 X (1 ρ 2 XY ) N(ρ XY σ X σ Y (y m y ) + m x }{{} E[X Y ], σx(1 2 ρ 2 XY )) }{{} V ar(x Y )

ENCS6161 p.42/47 Linear Transformation of Gaussian r.v.s Let X N(m, K), Y = AX then Y N( ˆm, C), where ˆm = Am and C = AKA T proof: f Y (y) = f X(A 1 y) A = 1 (2π) n 2 A K 1 2 exp { 1 } 2 (A 1 y m) T K 1 (A 1 y m) Note that A 1 y m = A 1 (y Am) = A 1 (y ˆm), so (A 1 y m) T K 1 (A 1 y m) = (y ˆm) T (A 1 ) T K 1 A 1 (y ˆm) = (y ˆm) T C 1 (y ˆm) and A K 1 2 = ( A 2 K ) 1 2 = ( AKA T ) 1 2 = C 1 2 f Y (y) = 1 (2π) n 2 C 1 2 exp { 1 } 2 (y ˆm)T C 1 (y ˆm) N( ˆm, C)

ENCS6161 p.43/47 Linear Transformation of Gaussian r.v.s Since K is symmetric, it is always possible to find a matrix A s.t. λ 1 0 λ 2 Λ = AKA T is diagonal. Λ =... 0 λ n so f Y (y) = = 1 (2π) n 2 Λ 1 2 e 1 2 (y ˆm)T Λ 1 (y ˆm) 1 e (y 1 ˆm 1 ) 2 2λ 1 2πλ1 1 e (y n ˆmn) 2 2λn 2πλn That is, we can transform X into n independent Gaussian r.v.s Y 1 Y n with means ˆm i and variance λ i.

ENCS6161 p.44/47 Mean Square Estimation We use g(x) to estimate Y, write as Ŷ = g(x). The cost associated with the estimation error is C(Y e.g. Ŷ ). C(Y Ŷ ) = (Y Ŷ )2 Y X = (Y g(x)) 2 The mean square error E[C] = E[(Y g(x)) 2 ] Case 1: if Ŷ = a E[(Y a) 2 ] = E[Y 2 ] 2aE[Y ] + a 2 = f(a) df da = 0 a = E[Y ] = µ Y The mean square error: E[(Y µ Y ) 2 ] = V ar[y ]

ENCS6161 p.45/47 Mean Square Estimation Case { 2: if Ŷ = ax + b, then E[(Y ax b)2 ] = f(a,b) f a = 0 f b = 0 σ Y a = ρ XY, b = µ Y a µ X σ X Ŷ = ρ σ Y XY (X µ X ) + µ Y σ X This is called the MMSE linear estimation. The mean square error: E[(Y Ŷ )2 ] = σy 2 (1 ρ2 XY ) If ρ XY = 0, Ŷ = µ Y, error= σy 2, reduces to case 1. If ρ XY = ±1, Ŷ = ±σ Y σ X (X µ X ) + µ Y, and error= 0

ENCS6161 p.46/47 Mean Square Estimation Case 3: Ŷ is a general function of X. E[(Y Ŷ )2 ] = E[(Y g(x)) 2 ] = E[E[(Y g(x)) 2 X]] = E[(Y g(x)) 2 x]f X (x)dx For any x, choose g(x) to minimize E[(Y g(x)) 2 X = x] g (x) = E[Y X = x] This is called the MMSE estimation. Example: X, Y joint Gaussian. E[Y X] = ρ X,Y σ Y σ X (X µ X ) + µ Y The MMSE estimation is linear for Gaussian.

ENCS6161 p.47/47 Mean Square Estimation Example: X uniform( 1, 1), Y = X 2. We have E[X] = 0 ρ XY = E[XY ] E[X]E[Y ] = E[XY ] = E[X 3 ] = 0 So, the MMSE linear estimation: Ŷ = µ Y and the error is σ 2 Y. The MMSE estimation: g (x) = E[Y X = x] = E[X 2 X = x] = x 2 So Ŷ = X2 and the error is 0.