Example of the Baum-Welch Algorithm

Σχετικά έγγραφα
Section 8.3 Trigonometric Equations

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

2 Composition. Invertible Mappings

Homework 3 Solutions

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

Other Test Constructions: Likelihood Ratio & Bayes Tests

EE512: Error Control Coding

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

The Simply Typed Lambda Calculus

Srednicki Chapter 55

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

derivation of the Laplacian from rectangular to spherical coordinates

Partial Differential Equations in Biology The boundary element method. March 26, 2013

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

Solution Series 9. i=1 x i and i=1 x i.

Matrices and Determinants

Areas and Lengths in Polar Coordinates

ΚΥΠΡΙΑΚΟΣ ΣΥΝΔΕΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY 21 ος ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ Δεύτερος Γύρος - 30 Μαρτίου 2011

C.S. 430 Assignment 6, Sample Solutions

Solutions to Exercise Sheet 5

Tridiagonal matrices. Gérard MEURANT. October, 2008

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

Reminders: linear functions

9.09. # 1. Area inside the oval limaçon r = cos θ. To graph, start with θ = 0 so r = 6. Compute dr

Approximation of distance between locations on earth given by latitude and longitude

Example Sheet 3 Solutions

DESIGN OF MACHINERY SOLUTION MANUAL h in h 4 0.

b. Use the parametrization from (a) to compute the area of S a as S a ds. Be sure to substitute for ds!

Areas and Lengths in Polar Coordinates

Numerical Analysis FMN011

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 6/5/2006

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 24/3/2007

Strain gauge and rosettes

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Statistical Inference I Locally most powerful tests

Fractional Colorings and Zykov Products of graphs

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ.

Συστήματα Διαχείρισης Βάσεων Δεδομένων

ΤΕΧΝΟΛΟΓΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΥΠΡΟΥ ΤΜΗΜΑ ΝΟΣΗΛΕΥΤΙΚΗΣ

Section 7.6 Double and Half Angle Formulas

ω ω ω ω ω ω+2 ω ω+2 + ω ω ω ω+2 + ω ω+1 ω ω+2 2 ω ω ω ω ω ω ω ω+1 ω ω2 ω ω2 + ω ω ω2 + ω ω ω ω2 + ω ω+1 ω ω2 + ω ω+1 + ω ω ω ω2 + ω

TMA4115 Matematikk 3

Every set of first-order formulas is equivalent to an independent set

6.3 Forecasting ARMA processes

14 Lesson 2: The Omega Verb - Present Tense

Second Order RLC Filters

Section 9.2 Polar Equations and Graphs

Notes on the Open Economy

Finite Field Problems: Solutions

w o = R 1 p. (1) R = p =. = 1

Μηχανική Μάθηση Hypothesis Testing

Inverse trigonometric functions & General Solution of Trigonometric Equations

A Note on Intuitionistic Fuzzy. Equivalence Relation

Chapter 2 * * * * * * * Introduction to Verbs * * * * * * *

Problem Set 3: Solutions

Math221: HW# 1 solutions

Fourier Series. MATH 211, Calculus II. J. Robert Buchanan. Spring Department of Mathematics

Chap. 6 Pushdown Automata

Congruence Classes of Invertible Matrices of Order 3 over F 2

5. Choice under Uncertainty

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

4.6 Autoregressive Moving Average Model ARMA(1,1)

Πρόβλημα 1: Αναζήτηση Ελάχιστης/Μέγιστης Τιμής

Partial Trace and Partial Transpose

k A = [k, k]( )[a 1, a 2 ] = [ka 1,ka 2 ] 4For the division of two intervals of confidence in R +

Part III - Pricing A Down-And-Out Call Option

SUPERPOSITION, MEASUREMENT, NORMALIZATION, EXPECTATION VALUES. Reading: QM course packet Ch 5 up to 5.6

Mean bond enthalpy Standard enthalpy of formation Bond N H N N N N H O O O

( ) 2 and compare to M.

PARTIAL NOTES for 6.1 Trigonometric Identities

( y) Partial Differential Equations

Nowhere-zero flows Let be a digraph, Abelian group. A Γ-circulation in is a mapping : such that, where, and : tail in X, head in

ANSWERSHEET (TOPIC = DIFFERENTIAL CALCULUS) COLLECTION #2. h 0 h h 0 h h 0 ( ) g k = g 0 + g 1 + g g 2009 =?

ST5224: Advanced Statistical Theory II

Overview. Transition Semantics. Configurations and the transition relation. Executions and computation

Homework 8 Model Solution Section

HOMEWORK#1. t E(x) = 1 λ = (b) Find the median lifetime of a randomly selected light bulb. Answer:

LESSON 14 (ΜΑΘΗΜΑ ΔΕΚΑΤΕΣΣΕΡΑ) REF : 202/057/34-ADV. 18 February 2014

Exercises 10. Find a fundamental matrix of the given system of equations. Also find the fundamental matrix Φ(t) satisfying Φ(0) = I. 1.

MATH423 String Theory Solutions 4. = 0 τ = f(s). (1) dτ ds = dxµ dτ f (s) (2) dτ 2 [f (s)] 2 + dxµ. dτ f (s) (3)

CE 530 Molecular Simulation

The challenges of non-stable predicates

Second Order Partial Differential Equations

5.4 The Poisson Distribution.

D Alembert s Solution to the Wave Equation

2. THEORY OF EQUATIONS. PREVIOUS EAMCET Bits.

Block Ciphers Modes. Ramki Thurimella

ECE Spring Prof. David R. Jackson ECE Dept. Notes 2

Durbin-Levinson recursive method

Lecture 10 - Representation Theory III: Theory of Weights

Elements of Information Theory

the total number of electrons passing through the lamp.

Calculating the propagation delay of coaxial cable

Lecture 13 - Root Space Decomposition II

Transcript:

Example of the Baum-Welch Algorithm Larry Moss Q520, Spring 2008 1 Our corpus c We start with a very simple corpus. We take the set Y of unanalyzed words to be {ABBA, BAB}, and c to be given by c(abba) = 10, c(bab) = 20. Note that the total value of the corpus is u Y c(u) = 10 + 20 = 30. 2 Our first HMM h 1 The first HMM h 1 is arbitrary. To have definite numbers around, we select some..3 s Starting probability of s is.85, of t is.15. In s, Pr(A) =.4, Pr(B) =.6. In t, Pr(A) =.5, Pr(B) =.5. 3 α(y, j, s).7.1 Let y Y, and let n be the length of y. For 1 j n and s one of our states, we define α(y, j, s) to be the probability in the space of analyzed words that the first j symbols match those of y, and the ending state is s. This is related to the computations in the Forward Algorithm because the overall probability of y in the HMM h is u S α(y, n, u). This is number is written as Pr h(y). Writing y as A 1 A 2 A n, we have ABBA t α(y, 1, s) = start(s)out(s, A 1 ) α(y, j + 1, s) = α(y, j, t)go(t, s)out(s, A j+1 ) α(abba, 1, s) = (.85)(.4) = 0.34. α(abba, 1, t) = (.15)(.5) = 0.08. α(abba, 2, s) = (0.34)(.3)(.6) + (0.08)(.1)(.6) = 0.06120 + 0.00480 = 0.06600. α(abba, 2, t) = (0.34)(.7)(.5) + (0.08)(.9)(.5) = 0.11900 + 0.03600 = 0.15500. α(abba, 3, s) = (0.06600)(.3)(.6) + (0.15500)(.1)(.6) = 0.01188 + 0.00930 = 0.02118. t S 1.9

α(abba, 3, t) = (0.06600)(.7)(.5) + (0.15500)(.9)(.5) = 0.02310 + 0.06975 = 0.09285. α(abba, 4, s) = (0.02118)(.3)(.4) + (0.09285)(.1)(.4) = 0.00254 + 0.00371 = 0.00625. α(abba, 4, t) = (0.02118)(.7)(.5) + (0.09285)(.9)(.5) = 0.00741 + 0.04178 = 0.04919. Total probability of ABBA is 0.00625 + 0.04919 = 0.05544. BAB α(bab, 1, s) = (.85)(.6) = 0.51. α(bab, 1, t) = (.15)(.5) = 0.08. α(bab, 2, s) = (0.51)(.3)(.4) + (0.08)(.1)(.4) = 0.0612 + 0.0032 = 0.0644. α(bab, 2, t) = (0.51)(.7)(.5) + (0.08)(.9)(.5) = 0.1785 + 0.0360 = 0.2145. α(bab, 3, s) = (0.06600)(.3)(.6) + (0.15500)(.1)(.6) = 0.01188 + 0.00930 = 0.0209. α(bab, 3, t) = (0.0644)(.7)(.5) + (0.2145)(.9)(.5) = 0.0225 + 0.0965 = 0.1190. Total probability of BAB is 0.0209 + 0.1190 = 0.1399. 3.1 The likelihood of the corpus using h 1 L(c, h 1 ) = Pr(ABBA) c(abba) Pr(BAB) c(bab) = 0.05544 10 0.1399 20 It is easier to work with the log of this, and then log L(c, h 1 ) = (10 log 0.05544) + (20 log 0.1399) = 68.2611 4 The β(y, j, s) values Define β(y, j, s) to be the following conditional probability: Given that the jth state is s, the (j + 1)st symbol will be A j+1, the (j + 2)nd will be A j+2,..., the nth will be A n. Writing y as A 1 A 2 A n, our equations go backward: 4.1 β(abba, j, s) for 1 j 4 β(y, n, s) = 1 β(y, j, s) = u S go(s, u)out(u, A j+1 )β(y, j + 1, u) β(abba, 4, s) = 1. β(abba, 4, t) = 1. β(abba, 3, s) = u S go(s, u)out(u, A)β(ABBA, 4, u) = (.3)(.4)(1) + (.7)(.5)(1) = 0.47000 β(abba, 3, t) = u S go(t, u)out(u, A)β(ABBA, 4, u) = (.1)(.4)(1) + (.9)(.5)(1) = 0.49000 β(abba, 2, s) = u S go(s, u)out(u, B)β(ABBA, 3, u) = (.3)(.6)(0.47000) + (.7)(.5)(0.49000) = 0.25610 β(abba, 2, t) = u S go(t, u)out(u, B)β(ABBA, 3, u) = (.1)(.6)(0.47000) + (.9)(.5)(0.49000) = 0.24870 β(abba, 1, s) = u S go(s, u)out(u, B)β(ABBA, 2, u) = (.3)(.6)(0.25610) + (.7)(.5)(0.24870) = 0.13315 β(abba, 1, t) = u S go(t, u)out(u, B)β(ABBA, 2, u) = (.1)(.6)(0.25610) + (.9)(.5)(0.24870) = 0.12729 2

4.2 β(bab, j, s) for 1 j 3 β(bab, 3, s) = 1. β(bab, 3, t) = 1. β(bab, 2, s) = u S go(s, u)out(u, B)β(BAB, 3, u) = (.3)(.4)(1) + (.7)(.5)(1) = 0.53000 β(bab, 2, t) = u S go(t, u)out(u, B)β(BAB, 3, u) = (.1)(.4)(1) + (.9)(.5)(1) = 0.51000 β(bab, 1, s) = u S go(s, u)out(u, A)β(BAB, 2, u) = (.3)(.6)(0.53000) + (.7)(.5)(0.51000) = 0.24210 β(bab, 1, t) = u S go(t, u)out(u, A)β(BAB, 2, u) = (.1)(.6)(0.53000) + (.9)(.5)(0.51000) = 0.25070 5 γ(y, j, s, t) Let y Y, and write y as A 1 A n. We want the probability in the subspace A(y) that an analyzed word has s as its jth state, (A j+1 as its (j + 1)st symbol), and t as its (j + 1)st state. (This only makes sense when 1 j < n.) This probability is called γ(y, j, s, t). It is given by γ(y, j, s, t) = α(y, j, s)go(s, t)out(t, A j+1)β(y, j + 1, t). Pr h (y) In other words, γ(y, j, s, t) is the probability that a word in A(y) has an s as its jth symbol and a t as its (j + 1)st symbol. It is important to see that for different unanalyzed words, say y and z, γ(y, j, s, t) and γ(z, j, s, t) are probabilities in different spaces. For example, γ(abba, 1, t, s) = α(abba, 1, t)go(t, s)out(s, B)β(ABBA, 2, s) Pr h (ABBA) = 0.08.1.6 0.25610 0.05544 = 0.02217. The values are γ(abba, 1, s, s) = 0.28271 γ(abba, 1, s, t) = 0.53383 γ(abba, 1, t, s) = 0.02217 γ(abba, 1, t, t) = 0.16149 γ(abba, 2, s, s) = 0.10071 γ(abba, 2, s, t) = 0.20417 γ(abba, 2, t, s) = 0.07884 γ(abba, 2, t, t) = 0.61648 γ(abba, 3, s, s) = 0.04584 γ(abba, 3, s, t) = 0.13371 γ(abba, 3, t, s) = 0.06699 γ(abba, 3, t, t) = 0.75365 γ(bab, 1, s, s) = 0.23185 γ(bab, 1, s, t) = 0.65071 γ(bab, 1, t, s) = 0.01212 γ(bab, 1, t, t) = 0.13124 γ(bab, 2, s, s) = 0.08286 γ(bab, 2, s, t) = 0.16112 γ(bab, 2, t, s) = 0.09199 γ(bab, 2, t, t) = 0.68996 6 δ(y, j, s) This is the probability of an analyzed word in A(y) that the jth state is s. For j < length(y), δ(y, j, s) = u S γ(y, j, s, u). Also, δ(y, n, s) = α(y, n, s)/ Pr h(y). 3

So here we have δ(abba, 1, s) = 0.81654 δ(abba, 1, t) = 0.18366 δ(abba, 2, s) = 0.30488 δ(abba, 2, t) = 0.69532 δ(abba, 3, s) = 0.17955 δ(abba, 3, t) = 0.82064 δ(abba, 4, s) = 0.11273 δ(abba, 4, t) = 0.88727 δ(bab, 1, s) = 0.88256 δ(bab, 1, t) = 0.14336 δ(bab, 2, s) = 0.24398 δ(bab, 2, t) = 0.78195 δ(bab, 3, s) = 0.14939 δ(bab, 3, t) = 0.85061 7 Our next HMM h 2 Recall that we start with a corpus c given by c(abba) = 10, c(bba) = 20. We want to use the δ values along with the corpus to get a new HMM, defined by relative frequency estimates of the expected analyzed corpus c. The starting probability of state s is I/(I + J), and that of t is J/(I + J), where I = δ(abba, 1, s)(c(abba)) + δ(bab, 1, s)(c(bab)) = (0.81654 10) + (0.88256 20) = 25.816600 J = δ(abba, 1, t)(c(abba)) + δ(bab, 1, t)(c(bab)) = (0.18366 10) + (0.14336 20) = 4.703800 So we get that the start of s is 0.846, and the start of t is 0.154. The probability of going from state s to state s will be K/(K + L), where K = (γ(abba, 1, s, s) + γ(abba, 2, s, s) + γ(abba, 3, s, s)) c(abba) +(γ(bab, 1, s, s) + γ(bab, 2, s, s)) c(bab) = (0.28271 + 0.10071 + 0.04584) (10) + (0.23185 + 0.08286) (20) = 10.58680 L = (γ(abba, 1, s, t) + γ(abba, 2, s, t) + γ(abba, 3, s, t)) c(abba) +(γ(bab, 1, s, t) + γ(bab, 2, s, t)) c(bab) = (0.53383 + 0.20417 + 0.13371) (10) + (0.65071 + 0.16112) (20) = 24.95370 So the new value of go(s, s) is 0.298. Similarly, the new value of go(s, t) is 0.702. The probability of going from state t to state s will be M/(M + N), where M = (γ(abba, 1, t, s) + γ(abba, 2, t, s) + γ(abba, 3, t, s)) c(abba) +(γ(bab, 1, t, s) + γ(bab, 2, t, s)) c(bab) = (0.02217 + 0.07884 + 0.06699) (10) + (0.01212 + 0.09199) (20) = 3.76220 N = (γ(abba, 1, t, t) + γ(abba, 2, t, t) + γ(abba, 3, t, t)) c(abba) +(γ(bab, 1, t, t) + γ(bab, 2, t, t)) c(bab) = (0.16149 + 0.61648 + 0.75365) (10) + (0.13124 + 0.68996) (20) = 31.74020 So the new value of go(t, s) is 0.106. Similarly, the new value of go(t, t) is 0.894. 4

Turning to the outputs, the probability that in state s we output A is K/(K + L), where K = (δ(abba, 1, s) + δ(abba, 4, s)) c(abba)) + (δ(bab, 2, s) c(bab)) = ((0.81654 + 0.11273) 10) + (0.24398 20) = 14.17230 L = (δ(abba, 2, s) + δ(abba, 3, s)) c(abba)) + ((δ(bab, 1, s) + δ(bab, 3, s)) c(bab)) = ((0.30488 + 0.17955) 10) + (0.88256 + 0.14939) 20) = 25.48330 Thus the probability is 0.357. Similarly, the probability that we output B in state s is 0.643. The probability that in state t we output A is M/(M + N), where M = (δ(abba, 1, t) + δ(abba, 4, t)) c(abba)) + (δ(bab, 2, t) c(bab)) = ((0.18366 + 0.88727) 10) + (0.78195 20) = 26.34830 N = (δ(abba, 2, s) + δ(abba, 3, s)) c(abba)) + ((δ(bab, 1, s) + δ(bab, 3, s)) c(bab)) = ((0.69532 + 0.82064) 10) + (0.14336 + 0.85061) 20) = 35.03900 Thus the probability is 0.4292. Similarly, the probability that we output B in state s is N/(M + N), 0.5708. 7.1 Another model We have a new HMM which we call h 2 : 0.298 s 0.702 0.106 t 0.894 Starting probability of s is 0.846, of t is 0.154. In s, Pr(A) = 0.357, Pr(B) = 0.643. In t, Pr(A) = 0.4292, Pr(B) = 0.5708. 8 Again At this point, we do all the calculations over again. I have hidden them, and only report the probabilities of the elements of Y and the log likelihood of the corpus. Total probability of ABBA is 0.00635 + 0.04690 = 0.05325. Total probability of BAB is 0.0223 + 0.1250 = 0.1473. 8.1 The likelihood of the corpus using h 2 L(c, h 2 ) = Pr(ABBA) c(abba) Pr(BAB) c(bab) = 0.05325 10 0.1473 20 log L(c, h 2 ) = (10 log 0.05325) + (20 log 0.1473) = 67.6333 5

8.2 Again a new model Using h 2, we then do all the calculations and construct a new HMM which we call h 3 : 0.292 s 0.708 0.109 t Starting probability of s is 0.841 of t is 0.159. In s, Pr(A) = 0.3624, Pr(B) = 0.6376. In t, Pr(A) = 0.4252, Pr(B) = 0.5748. 9 Again Again Total probability of ABBA is 0.00653 + 0.04672 = 0.05325. Total probability of BAB is 0.0223 + 0.1254 = 0.1477. 0.891 9.1 The likelihood of the corpus using h 3 L(c, h 3 ) = Pr(ABBA) c(abba) Pr(BAB) c(bab) = 0.05325 10 0.1477 20 log L(c, h 3 ) = (10 log 0.05325) + (20 log 0.1477) = 67.5790 9.2 Another model After doing all the calculations once again, we have a new HMM which we call h 4 : 0.287 s 0.713 0.111 Starting probability of s is 0.841, of t is 0.159. In s, Pr(A) = 0.3637, Pr(B) = 0.6363. In t, Pr(A) = 0.4243, Pr(B) = 0.5757. 10 The likelihoods The likelihood of c in h 1 was 68.2611. The likelihood of c in h 2 was 67.6333. The likelihood of c in h 3 was 67.5790. In playing around with different starting values, I found that the likelihood on h 3 sometimes was worse than that of h 2 (contrary to what we ll prove in class). I believe this is due to rounding errors in the calculations of the starting probabilities in the different states. I also noticed that most of the updating was actually to those starting probabilities, with the others changing only a little. t 0.889 6