CMPSCI 69GG Applied Information Theory Fall 006 Problem Set 3: Solutions. [Cover and Thomas 7.] a Define the following notation, C I p xx; Y max X; Y C I p xx; Ỹ max I X; Ỹ We would like to show that C I p xx; Ỹ I p xx; Y C. Notice that X, Y, and Ỹ form a Markov chain such that X Y Ỹ. Using the data-processing inequality Theorem.8., we know that, I p xx; Ỹ I p xx; Y 3. I p xx; Y 3. b We would like to determine under what conditions the following equality holds. Given our result in Equation 3., it is sufficient to show, I p xx; Ỹ I p xx; Y We know that the following equality is true for Markov chains see proof of Theorem.8., I p xx; Ỹ I p xx; Y I p xx; Y Ỹ However, p x and p x may not be the same distribution, so I p xx; Ỹ I p xx; Ỹ 3.3 I p xx; Y I p xx; Y Ỹ 3.4 We can show our objective inequality if I p xx; Y Ỹ 0. injective function. This occurs if Ỹ gy is an. [Cover and Thomas 7.] Consider the behavior of this channel as depicted in Figure 3.. When a, this is a Noisy Channel with Nonoverlapping Outputs. We would like to compute the capacity of the channel in this situation, C max IX; Y max HX HX Y 3-
3- Problem Set 3: Solutions x 0 0.50 0.50 0.50 0.50 y 0 a +a Figure 3.: Noisy channel model for question 7.. Because X can be determined by Y, HX Y 0. Therefore, C max HX bit When a, this is a Binary Erasure Channel. We will compute the capacity for a. We begin by defining the conditional entropy, HX Y y Y P Y yhx Y y We can now compute the capacity, 4 HX Y 0 + HX Y + HX Y 4 HX The computation for a is similar 3. [Cover and Thomas 7.3] C max IX; Y max HX HX Y max HX HX max HX max HX bit I X; Y H X H X Y
Problem Set 3: Solutions 3-3 However, because this is a binary symmetric channel, the uncertainty about X i and Z i is equivalent given Y i. We can replace X with Z, I X; Y H X H Z Y 3.5 Now we will derive a bound for H Z Y using properties of conditional entropy Theorems.6.5 and.6.6. H Z Y H Z n HZ i i nhp Replacing H Z Y with nhp in Equation 3.5 will reduce the right hand side. following inequality. Define the following notation, This gives us the I X; Y H X nhp 3.6 H p x X nhp max H X nhp This defines the maximum value of the right hand side of Equation 3.6. Assuming that H X HXi, the maximizing distribution, p x, is uniform. This means that H p x X nhp n nhp n Hp nc We are interested in the capacity of the channel with memory. max I p x X; Y nc p x 4. [Cover and Thomas 7.8] We define our set of distributions as, py x py 0 T [ 0 ] First, we compute the entropy, HY, HY [ ] log [ log ] + log
3-4 Problem Set 3: Solutions Next, we compute the conditional entropy, HY X, HY X px 0 HY X 0 + px HY X H [ 0 ] + H [ ] 0 + We can use HY and HY X to compute the capacity of the channel, C max IX; Y max HY HY X max Notice that we are maximizing a function of, log + f log + log log To find the maximum of this function, we differentiate with respect to lambda, df d log log log log Setting this to zero, we can derive the maximum, log log 0 5 f log 5 bits 0.39 bits 5. [Cover and Thomas 7.3]
Problem Set 3: Solutions 3-5 a Given the following distributions, α ɛ ɛ α py x ɛ α ɛ α T α ɛ ɛ α py ɛ α ɛ α T α ɛ + α + ɛ ɛ ɛ + α α We would like to compute the capacity of this channel, C max IX; Y max HY HY X However, we can show that HY X does not depend on, HY X px 0 HY X 0 + px HY X H [ α ɛ ɛ α ] + H [ ɛ α ɛ α ] H [ α ɛ ɛ α ] This means we only need to find the maximizing HY. We could differentiate using our calculation of py. Instead, we use the method from Section 7..5, defining E be the event that {Y e}. We can use this to derive HY. HY HY, E HE + HY E HE + αhy E 0 where the last line follows from the fact that HY E 0. Because HE is not a function of, we can leave it here. Therefore, we want to maximize HY E 0. So we need to compute py E 0, P E 0 Y 0P Y 0 py 0 E 0 P E 0 + α ɛ + α + ɛ α P E 0 Y P Y py E 0 P E 0 ɛ ɛ + α α P Y 0 E 0 Again, we could differentiate HY E 0 with respect to but that s hairy. Instead, we ll recall that HY E 0 with equality when py 0 E 0 py E 0. py 0 E 0 py E 0 α ɛ + α + ɛ α ɛ ɛ + α α
3-6 Problem Set 3: Solutions The channel capacity is, C max IX; Y HE + αh Y E 0 HY X H [ α α ] + α H [ α ɛ ɛ α ] b In the situation where α 0, c In the situation where ɛ 0, C H [ 0 ] + H [ ɛ ɛ 0 ] H [ ɛ ɛ ] C H [ α α ] + α H [ α 0 α ] α 6. [Cover and Thomas 7.5] Given the following distributions, [ ] py 0.45 0.05 px, y 0.05 0.45 0.90 0.0 py x 0.0 0.90 a HX bit HY bit HX, Y.469 bits IX; Y HX + HY HX, Y 0.53 bits b For X n, n log pxn HX < ɛ n log HX < ɛ 0 < ɛ c Therefore, all X n are typical. The proof for Y n is similar. HX, Y, Z HX, Y + HZ X, Y HX, Y, Z HX, Z + HY X, Z HX, Y HX, Z
Problem Set 3: Solutions 3-7 HX, Z HX + HZ X HX + HZ since Z is independent of X HX, Y 3.7 We also know that z n is typical. Therefore, ɛ > n log pzn HZ z n is typical n log pzn HZ + n log pxn HX shown in part b n log pzn + log px n HZ + HX n log pzn px n HX, Y Equation 3.7 n log pxn, y n HX, Y Equation 7.6 in the text Therefore, x n, y n is jointly typical. d By inspecting px, y, above, we know that, [ 0.90 0.0 ] By the definition of typicality, we know that if z n is in the set, then HZ ɛ < n log pzn < HZ + ɛ H [ 0.90 0.0 ] 0.0 < n log pzn < H [ 0.90 0.0 ] + 0.0 0.69 < n log pzn < 0.669 e This corresponds to k,, 3, 4. Therefore A 5 0.0Z 575. P rx n i, Y n A n ɛ X, Y P ry n x n i A n ɛ Z P rx n i + Z n x n i A n ɛ Z P rz n A n ɛ Z z n A n ɛ Z pz n z n A n ɛ Z p k p n k 4 k 0.830 n p k p n k k
3-8 Problem Set 3: Solutions f P rx n, y n A n ɛ X, Y P ry n X n A n ɛ Z x n P ry n x n A n ɛ Z z n A n ɛ Z px n z n A n ɛ Z n An ɛ Z n g h 7. [Cover and Thomas 7.0] a IX; Y, Y HY, Y HY, Y X HY + HY Y HY X HY Y, X HY + HY IY, Y HY X HY Y, X HY HY X + HY HY X + HY X HY Y, X IY, Y IY ; X + IY ; X + IY, Y X IY, Y IY ; X + IY ; X IY, Y Y and Y conditionally independent given X IY ; X IY, Y Y and Y identically distributed given X b C X Y,Y max IX; Y, Y max IX; Y IY, Y max IX; Y since IY ; Y 0 max IX; Y C X Y 8. [Cover and Thomas 7.30] a C max IX; Y max HX HX Y
Problem Set 3: Solutions 3-9 b We cleverly select Z so that HX Y 0. This occurs when Z results in a channel with nonoverlapping outputs. One such set of values is Z {4, 8, }. We pick the uniform distribution over X to maximize HX. The entropy for this distribution is log X bits. This is also our maximum channel capacity. HX, Y, Z HX, Y, Z HX, Y Z + HZ HX, Z Y + HY HX Z + HY X, Z + HZ HX Y + HZ X, Y + HY HX Z + HZ HX Y + HY HX + HZ HX Y + HY HX HX Y HY HZ IX; Y HY HZ IX; Y HY log 3 Therefore min IX; Y min HY. The minimum entropy for Y occurs when Y is small. This occurs when Z is a set of 3 consecutive integers. In this case, Y is a set of six consecutive integers. In the case of Z {0,, }, we have P Y 0 3 0 P Y 3 0 + P Y 3 0 + + P Y 3 3 0 P Y 4 3 0 + P Y 5 3 0 + + where pi i and 3 i0 i. We need to find the values of i which maximize HY. This occurs when Y is uniformly distributed or, 0 0 0 In this case, HY log 6 and C log 6 log 3.