Baum-Welch Step by Step the Baum-Welch Algorithm and its Application Jin ichi MURAKAMI EM EM EM Baum-Welch Baum-Welch Baum-Welch Baum-Welch, EM 1. EM 2. HMM EM (Expectationmaximization algorithm) 1 3. HMM EM 4. HMM 5. HMM Baum-Welch 6. Baum-Welch EM Baum- Welch 2 X.D.Huang Steve Young 3 4 Steve Young Baum-Welch 4 3 4 1 1 a, b, c E-mail murakami@ike.tottori-u.ac.jp Jin ichi Murakami, Member (Department of Information and Electronics, Graduate School of Engineering Tottori University 4-101, Koyamachou Minami, Tottori, 680-8552 Japan). Fundamentals Review Vol.xx No.xx pp.1 9 xxxxxx c xxxx Baum-Welch Hidden Markov Model 2. HMM 2. 1 HMM IEICE Fundamentals Review Vol.xx No.xx 1
2. 3 HMM Left-to-Right HMM 1 i = 0 π 0 = 1.0 HMM HMM Left-to- HMM Right HMM a i,j a i,j i j b i,j (O) 2. 2 Ergodic HMM Left-to-Right HMM HMM Left-to-Right HMMErgodic parameter tie 2.4 HMM Left-to-Right HMM Ergoic HMM 2 Left-to-Right HMM Left-to-Right HMM 4 3 HMM Left-to-Right HMM 3 i j O Ergodic HMM b i,j (O) i j 2 Left-to-Right HMM 3 Ergodic HMM Left-to-Right HMM ( ) Ergodic HMM HMM ( ) 5 Left-to-Right HMM 6 Ergodic-HMM 7 Left-to-Right HMM HMM 4 π i π i i b i,j (O) i j O parameter tie b i (O) 2. 4 parameter tie b i,j (O) = b i,i (O) = b i (O) b i,j (O) i i b i,i (O) parameter tie b i,j (O) = b i,i (O) = b i (O) b i,i (O) b i,j (O) parameter tieb i,j (O) = b i,i (O) 2. 5 HMM 2 IEICE Fundamentals Review Vol.xx No.xx
4 1 HMM 1.0 X π i = 1.0 i i j 1.0 X a i,j = 1.0 j 2Model B 2Model B i 1.0 2 Model B X b i (O) = 1.0 O K K O 2. 6 HMM HMM Model A 1 Model B 2 4 3 Left-to-Right HMM (i = {0,1,2}, j = {0,1,2,3}) α, β, γ3 ( K = {α, β, γ}) π 0 1Model A ( i = 0, i = 3) 4 Model A π 0 = 1.0 π 1 = 0.0 π 2 = 0.0 3 a 0,0 = 0.5 a 1,1 = 0.8 a 2,2 = 0.7 a 0,1 = 0.5 a 1,2 = 0.2 a 2,3 = 0.3 b 0 (α) = 0.1 b 1 (α) = 0.4 b 2 (α) = 0.1 b 0 (β) = 0.5 b 1 (β) = 0.2 b 2 (β) = 0.8 b 0 (γ) = 0.4 b 1 (γ) = 0.4 b 2 (γ) = 0.1 5 Model B 1 Model A π 0 = 1.0 π 1 = 0.0 π 2 = 0.0 3 a 0,0 = 0.7 a 1,1 = 0.6 a 2,2 = 0.1 a 0,1 = 0.3 a 1,2 = 0.4 a 2,3 = 0.9 b 0 (α) = 0.7 b 1 (α) = 0.4 b 2 (α) = 0.1 b 0 (β) = 0.2 b 1 (β) = 0.3 b 2 (β) = 0.1 b 0 (γ) = 0.1 b 1 (γ) = 0.3 b 2 (γ) = 0.8 3HMM 2.5 Model A Model BHMM 3 3 HMM π 0 + π 1 + π 2 = 1.0 a 0,0 + a 0,1 = 1.0 a 1,1 + a 1,2 = 1.0 a 2,2 + a 2,3 = 1.0 b 0 (α) + b 0 (β) + b 0 (γ) = 1.0 b 1 (α) + b 1 (β) + b 1 (γ) = 1.0 b 2 (α) + b 2 (β) + b 2 (γ) = 1.0 IEICE Fundamentals Review Vol.xx No.xx 3
3. 3. 1 α α β γ O t t O 0 = α, O 1 = α, O 2 = γ, O 3 = β, O 4 = β, O 5 = γ) O 0 = α, O 1 = α, O 2 = β, O 3 = γ 40ms N 256 3. 2 Model A 6 HMM grid HMM α i (j) α i HMM 0 i j Model A O t t 4 3 grid 4 forward grid t 0 1 2 3 O t α α β γ path1: 0 0 1 2 3 path2: 0 1 1 2 3 path3: 0 1 2 2 3 5 path1: 5 a 0,0 b 0 (O 0 ) a 0,1 b 0 (O 1 ) a 1,2 b 1 (O 2 ) a 2,3 b 2 (O 3 ) = a 0,0 b 0 (α) a 0,1 b 0 (α) a 1,2 b 1 (β) a 2,3 b 2 (γ) = 0.7 0.7 0.3 0.7 0.4 0.3 0.9 0.8 = 0.00889 path2: a 0,1 b 0 (O 0 ) a 1,1 b 1 (O 1 ) a 1,2 b 1 (O 2 ) a 2,3 b 2 (O 3 ) = a 0,1 b 0 (α) a 1,1 b 1 (α) a 1,2 b 1 (β) a 2,3 b 2 (γ) = 0.3 0.7 0.6 0.4 0.4 0.3 0.9 0.8 = 0.00435 path3: a 0,1 b 0 (O 0 ) a 1,2 b 1 (O 1 ) a 2,2 b 2 (O 2 ) a 2,3 b 2 (O 3 ) = a 0,1 b 0 (α) a 1,2 b 1 (α) a 2,2 b 2 (β) a 2,3 b 2 (γ) = 0.3 0.7 0.4 0.4 0.1 0.1 0.9 0.8 = 0.000241 0.01349(= 0.00889 + 0.00435 + 0.000241) 4(O 0 = α, O 1 = α, O 2 = β, O 3 = γ) 3 6( 10 3. 3 Forward 3.2 forward α t (j) = α t 1 (j) a j,j b j (O t 1 ) +α t 1 (j 1) a j 1,j b j 1 (O t 1 ) Model A grid α 0 (0) = 1.0 α 1 (0) = α 0 (0) a 0,0 b 0 (O 0 ) = α 0 (0) a 0,0 b 0 (α) = 1.0 0.7 0.7 = 0.49 α 1 (1) = α 0 (0) a 0,1 b 0 (O 0 ) = α 0 (0) a 0,1 b 0 (α) = 1.0 0.3 0.7 = 0.21 α 2 (1) = α 1 (1) a 1,1 b 1 (O 1 ) + α 1 (0) a 01 b 0 (O 1 ) = α 1 (1) a 1,1 b 1 (α) + α 1 (0) a 01 b 0 (α) = 0.21 0.6 0.4 + 0.49 0.3 0.7 = 0.1533 α 2 (2) = α 1 (1) a 1,2 b 1 (O 1 ) = α 1 (1) a 1,2 b 1 (α) = 0.21 0.4 0.4 = 0.0336 α 3 (2) = α 2 (2) a 2,2 b 2 (O 2 ) + α 2 (1) a 1,2 b 1 (O 2 ) = α 2 (2) a 2,2 b 2 (β) + α 2 (1) a 1,2 b 1 (β) = 0.0336 0.1 0.1 + 0.1533 0.4 0.3 = 0.018732( 0.01873) α 4 (3) = α 3 (2) a 2,3 b 2 (O 3 ) = α 3 (2) a 2,3 b 2 (γ) = 0.018732 0.9 0.8 = ( 0.01349) = α 4 (3) = 4 IEICE Fundamentals Review Vol.xx No.xx
3.2 4. 2 HMM HMM 1.0 4.1 HMM 6 7 6 forward 3. 4 1 7 α α β γ O t t O 0 = α, O 1 = α, O 2 = β, O 3 = γ 2 Model AHMMModel BHMM HMM Model AHMM 0.01349 Model BHMM 0.00009 Model A Model B Model A Model 4. HMM forwardgrid 8 3. HMM 4. 5. HMM 6 HMM π 0 = 1.0 π 1 = 0.0 π 2 = 0.0 3 a 0,0 = 0.5 a 1,1 = 0.0 a 2,2 = 0.0 a 0,1 = 0.5 a 1,2 = 1.0 a 2,3 = 1.0 b 0 (α) = 1.0 b 1 (α) = 0.0 b 2 (α) = 0.0 b 0 (β) = 0.0 b 1 (β) = 1.0 b 2 (β) = 0.0 b 0 (γ) = 0.0 b 1 (γ) = 0.0 b 2 (γ) = 1.0 4. 1 α α β γ O t t O 0 = α, O 1 = α, O 2 = β, O 3 = γ HMM HMM 8 2.5 3 0.25 IEICE Fundamentals Review Vol.xx No.xx 5
5. Baum-Welch 4.2 Baum-Welch 2 β 4 (3) = 1.0 β 4 (2) = 0.0 β 4 (1) = 0.0 β 4 (0) = 0.0 backward grid Left-to-Right HMM β t (j) = β t+1 (j + 1) a j,j+1 b j (O t ) Baum-Welch t i j Γ t (i, j) forward backward Γ t (i, β j) 4 (3) = 1.0 β 3 (2) = β 4 (3) a 2,3 b 2 (O 3 ) = β 4 (3) a 2,3 b 2 (γ) 5. 1 Forward = 0.72 0.1 0.1 = 0.0072 4.1 β 2 (1) = β 3 (2) a 1,2 b 1 (O 2 ) = β 3 (2) a 1,2 b 1 (β) 1 = 0.72 0.4 0.3 = 0.0864 forward β 1 (1) = β 2 (2) a 1,2 b 1 (O 1 ) + β 2 (1) a 1,1 b 1 (O 1 ) 9 3.3 6 = β 2 (2) a 1,2 b 1 (α) + β 2 (1) a 1,1 b 1 (α) forward grid α t (i) = 0.0072 0.4 0.4 + 0.0864 0.6 0.4 α t (j) = α t 1 (j) a j,j b j (O t 1 ) α 0 (0) = 1.0 α 1 (0) = 0.49 α 1 (1) = 0.21 α 2 (1) = 0.1533 α 2 (2) = 0.0336 α 3 (2) = 0.01873 α 4 (3) = 0.01349 = α 4 (3) = 0.01349 +α t 1 (j 1) a j 1,j b j 1 (O t 1 ) backword grid β t (i) β t (i) t i forward grid α t (i) +β t+1 (j) a j,j b j (O t ) grid = 1.0 0.9 0.8 = 0.72 β 2 (2) = β 3 (2) a 2,2 b 2 (O 2 ) = β 3 (2) a 2,2 b 2 (β) = 0.021888( 0.02189) β 1 (0) = β 2 (1) a 0,1 b 0 (O 1 ) = β 2 (1) a 0,1 b 0 (α) = 0.0864 0.3 0.7 = 0.018144( 0.01814) β 0 (0) = β 1 (1) a 0,1 b 0 (O 0 ) + β 1 (0) a 0,0 b 0 (O 0 ) = β 1 (1) a 0,1 b 0 (α) + β 1 (0) a 0,0 b 0 (α) = 0.021888 0.3 0.7 + 0.018144 0.7 0.7 = ( 0.01349) Backword grid β 0 (0) Forward grid α 4 (3) = β 0 (0) = α 4 (3) = 0.01349 9 forward 5. 2 Backword 10 backward Backward forward 10 6 IEICE Fundamentals Review Vol.xx No.xx
5. 3 Γ t (i, j) 5. 4 a i,j Baum-Welch a i,j Γ t (i, j) Γ t (i, j) t i j Baum-Welch ForwardBackword P t Γ t (i, j) a i,j = Γ t(i, j) P P t j Γ t(i, j) Γ t (i, j) = α t(i)a i,j b i (O t )β t+1 (j) likelihood likelihood = = α 4 (3) Γ t (i, j) Γ 0 (0,0) = α 0(0)a 0,0 b 0 (O 0 )β 1 (0) = α 0(0)a 0,0 b 0 (α)β 1 (0) = 1.0 0.7 0.7 0.018144 = 0.65919 Γ 0 (0,1) = α 0(0)a 0,1 b 0 (O 0 )β 1 (1) = α 0(0)a 0,1 b 0 (α)β 1 (1) = 1.0 0.3 0.7 0.021888 = 0.34081 Γ 1 (0,1) = α 1(0)a 0,1 b 0 (O 1 )β 2 (1) = α 1(0)a 0,1 b 0 (α)β 2 (1) = 0.49 0.3 0.7 0.0864 = 0.65919 Γ 1 (1,1) = α 1(1)a 1,1 b 1 (O 1 )β 2 (1) = α 1(1)a 1,1 b 1 (α)β 2 (1) = 0.21 0.6 0.4 0.0864 = 0.32287 Γ 1 (1,2) = α 1(1)a 1,2 b 1 (O 1 )β 2 (2) = α 1(1)a 1,2 b 1 (α)β 2 (2) = 0.21 0.4 0.4 0.0072 = 0.01794 Γ 2 (1,2) = α 2(1)a 1,2 b 1 (O 2 )β 3 (2) = α 2(1)a 1,2 b 1 (β)β 3 (2) = 0.1533 0.4 0.3 0.72 = 0.98206 Γ 2 (2,2) = α 2(2)a 2,2 b 2 (O 2 )β 3 (2) = α 2(2)a 2,2 b 2 (β)β 3 (2) = 0.0336 0.1 0.1 0.72 = 0.01794 Γ 3 (2,3) = α 3(2)a 2,3 b 2 (O 3 )β 4 (3) = α 3(2)a 2,3 b 2 (γ)β 4 (3) = 0.018732 0.9 0.8 1.0 = 1.0 b j (O)Γ t (i, j) Γ 0 (0,0) + Γ 0 (0,1) = 1.0 Γ 1 (0,1) + Γ 1 (1,1) + Γ 1 (1, 2) = 1.0 Γ 2 (1,2) + Γ 2 (2,2) = 1.0 Γ 3 (2,3) = 1.0 Γ t (i, j) t i j Γ t (i, j) t O t O t O Baum-Welch 11Γ t (i, j) 11 Γ t (i, j) a i,j a 0,0 = a 0,1 = a 1,1 = a 1,2 = a 2,2 = a 2,3 = Γ 0 (0,0) Γ 0 (0,0)+Γ 0 (0,1)+Γ 1 (0,1) = 0.39756 Γ 0 (0,1)+Γ 1 (0,1) Γ 0 (0,0)+Γ 0 (0,1)+Γ 1 (0,1) = 0.60244 Γ 1 (1,1) Γ 1 (1,1)+Γ 1 (1,2)+Γ 2 (1,2) = 0.2441 Γ 1 (1,2)+Γ 2 (1,2) Γ 1 (1,1)+Γ 1 (1,2)+Γ 2 (1,2) = 0.7559 Γ 2 (2,2) Γ 2 (2,2)+Γ 3 (2,3) = 0.01762 Γ 3 (2,3) Γ 2 (2,2)+Γ 3 (2,3) = 0.98238 a 0,0 + a 0,1 = 1.0 a 1,1 + a 1,2 = 1.0 a 2,2 + a 2,3 = 1.0 5. 5 b j (O) P P t O k b j (O) = Γ t(j, k) P P t k Γ t(j, k) b j (O) b 0 (α) = Γ 0(0,0)+Γ 0 (0,1)+Γ 1 (0,1) Γ 0 (0,0)+Γ 0 (0,1)+Γ 1 (0,1) = 1.0 0 b 0 (β) = Γ 0 (0,0)+Γ 0 (0,1)+Γ 1 (0,1) = 0.0 0 b 0 (γ) = Γ 0 (0,0)+Γ 0 (0,1)+Γ 1 (0,1) = 0.0 Γ b 1 (α) = 1 (1,1)+Γ 1 (1,2) Γ 1 (1,1)+Γ 1 (1,2)+Γ 2 (1,2) = 0.2576 Γ b 1 (β) = 2 (1,2) Γ 1 (1,1)+Γ 1 (1,2)+Γ 2 (1,2) = 0.7424 0 b 1 (γ) = Γ 1 (1,1)+Γ 1 (1,2)+Γ 2 (1,2) = 0.0 0 b 2 (α) = Γ 2 (2,2)+Γ 3 (2,3) = 0.0 Γ b 2 (β) = 2 (2,2) Γ 2 (2,2)+Γ 3 (2,3) = 0.01762 Γ b 2 (γ) = 3 (2,3) Γ 2 (2,2)+Γ 3 (2,3) = 0.98238 b 0 (α) + b 0 (β) + b 0 (γ) = 1.0 b 1 (α) + b 1 (β) + b 1 (γ) = 1.0 b 2 (α) + b 2 (β) + b 2 (γ) = 1.0 IEICE Fundamentals Review Vol.xx No.xx 7
5. 6 forward DNA 12 0 Baum-Welch 8 9 Baum-Welch Ergodic HMM = 0.1503 12 forward 0.1503 5.1 0.01349 5. 7 Baum-Welch 5.1 5.5 13 (likelihood) 4.2 0.25 7. Baum-Welch Baum-Welch 6. Baum-Welch a i,j Γ t (i, j) b j (O)Γ t (i, j) HMM Ergodic HMM 7 Ergodic HMM Ergodic HMM Baum- Welch 13 Γ t (i, j) Γ t (i, j) a i,j b j (O) HMM Baum-Welch 8 IEICE Fundamentals Review Vol.xx No.xx
1S. Rolf, Maximum likelihood theory for incomplete data from an exponential family, Scandinavian Journal of Statistics 1, pp.49-58, 1974. 2L.E.Baum and T.Petrie, Statistical inference for probabilistic functtions of finite state Markov chains, The Annals of Mathematical Statistics, vol.37, no.6, pp.1554-1563, 1966. 3X.D.Huang, Y. Ariki, M.A.Jack, Hidden Markov Models For Speech Recognition, 0-7486-0162-7, Edinburgh University Press, Edinburgh, 1990. 4S.J. Young, Sj Young, The HTK Hidden Markov Model Toolkit: Design and Philosophy, Entropic Cambridge Research Laboratory, vol 2, pp.2-44, 1994. http://htk.eng.cam.ac.uk/. 5Kai-Fu Lee, Automatic Speech Recognition: The Development of the SPHINX Recognition System, 978-0898382969, Springer, New York, 1988. 6F. Jerinek, Statistical Methods for Speech Recognition, 0-262-10066-5, The MIT Press Cambride, Massachusetts London, Massachusetts, 1997. 7,,, Ergodic HMM,, SIG-SLUD-9204-3, pp.17-24, 1993. 8, Baum-Welch,, SP94-97, pp.45-50, 1995. 9C.Burge,S. Karlin, Prediction of complete gene structures in human genomic DNA, J. Mol. Biol. 268, pp.78-94, 1997. 1984 3 1986 NTT NTT 1991 (ATR) 1995NTT 1998 IEICE Fundamentals Review Vol.xx No.xx 9