ISSN 000-9825, CODEN RUXUEW E-ail: jo@icaaccn Journal of Software, Vol8, No3, March 2007, pp608 66 http://wwwjoorgcn DOI: 0360/jo80608 Tel/Fax: +86-0-62562563 2007 by Journal of Software All right reerved +, (, 70072) A Robut Boottrapping Algorith of Speaker Model for On-Line Unupervied Speaker Indexing FU Zhong-Hua +, ZHANG Yan-Ning (School of Coputer Science, Northwetern Polytechnical Univerity, Xi an 70072, China) + Correponding author: Phn: +86-29-88494848, E-ail: ailfzh@nwpueducn Fu ZH, Zhang YN A robut boottrapping algorith of peaker odel for on-line unupervied peaker indexing Journal of Software, 2007,8(3):608 66 http://wwwjoorgcn/000-9825/8/608ht Abtract: A robut boottrapping fraework, which eploy Multi-EigenSpace odeling technique baed on regreion cla (RC-MES) to build peaker odel with pare data, and a hort-egent clutering to prevent the too hort egent fro influencing boottrapping, are propoed in thi paper For a real dicuion archived with a total duration of 8 hour, the ignificant robutne of the propoed ethod i deontrated, which not only iprove the peaker change detection perforance but alo outperfor the conventional boottrapping ethod, even if the average boottrapping egent duration i le than 5 econd Key word: : peaker indexing; peaker odel; regreion cla; eigenvoice,, 8 5 : ; ; : TP39 : A, [,2] [3 5] [6],,, ( Supported by the Science & Technology Reearch and Developent Plan of Shanxi Province of China under Grant No2005k04-G23 ( ) Received 2006-07-28; Accepted 2006--3
: 609 [3] ), ( [2] ),,K-L (Kullback-Leibler divergence) [7] [8] (Bayeian inforation criterion, BIC) [9] (generalized likelihood ratio, GLR),,, (univeral background odel, UBM) (Gauian ixture odel, GMM) [0] (aple peaker odel, SSM) [],, [] ( A) λ A ; λ A, A λ A, λ B ; (boottrapping),,,,,, 2, 0 75%,,, [], [2] 3 GMM, [2] BIC, (off-line), (eigenvoice) (axiu likelihood linear regreion, MLLR) (ulti-eigenpace odeling baed on regreion cla RC-MES) [3],,, GLR RC-MES 2,, 2 GLR 22 3 4 RC-MES,,,Kuhn [4], (eigenpace),,,,,,,
60 Journal of Software Vol8, No3, March 2007 (RC-MES) MLLR, MLLR,,,,, (regreion cla tree),, RC-MES : (offline) (online) Offline : () (peaker independent, SI) ( GMM ), [7] ; (2) A S, GMM, S SI ; (3) R S (peaker dependent, SD) SI SD A R*S SD ; (4) A, S [e i (0),e i (),,e i (k)],i=,,s Online : () A, S ; (2) S SI, ; (3), S [w i (),w i (2),,e i (k)],i=,,s ; (4), S GMM; (5) S GMM GMM,GMM GMM (axiu likelihood eigen-decopoition, MLED),GMM SI EM S T T T T [ e ( j), e ( j),, e ( j),, e ( j) ], j 0,,, K e ( j) = 0 M = (), e ( j),k ( [5]) T M t= = r ( ) ( t) T M K [ ] ( ) ( ) e ( i) C ot = r ( t) t= = k= 0 w ( k) e ( k) C (), t S, S,r () (t) o t r ( ) C M k = 0 e ( i) ( ) ( t) = P( i = o, λ ) = p b ( o ) p b ( o ) (3) t t, p b () S,λ S t k k t (2)
: 6 (2) k+ ( S ) k+, Offline GMM EM 2 : Speech ignal Pre-proce Pre-Proceing Model boottrapping bae on RC-MES No Speaker change detection Ye Model exit? No Ye Speaker odel adaptation Fig Short egent clutering Speaker odel et Flow diagra of the propoed unupervied peaker indexing GLR,, RC-MES 2,, K-L BIC GLR,, GLR [6,7] [] X Y,, X={x,x 2,,x N },Y={y,y 2,,y N },,N X Y (axiu likelihood, ML), λ X λ Y Z X Y Z EM GMM, λ Z, GLR d GLR,L() L d GLR =(L(X λ X )+L(Y λ Y )) L(Z λ Z ) (4) N N ( Z λz ) = log p( xi λz ) + log p( yi λz i= i= ) (5)
62 Journal of Software Vol8, No3, March 2007 L N N ( X λx ) = log p( xi λx ), L( Y λy ) = log p( yi λy i= i=,p(x i λ X ) x i λ X,p(y i λ Y ) Z GMM λ X λ Y λ Z, (4) [6], X Y, X Y, 00<d GLR Nlog20 (7) ) d GLR 0 (8), d GLR 0,,, θ GLR,,, (localized earch algorith, LSA) [], d GLR,,, 2, 02 22,,,,,, 20, GMM GMM 80 ~00 ( ) [8],, ( UBM [0] SSM [] ),, RC-MES,,, RC-MES S={, 2,, K } K i j GLR d GLR ( i, j ), d GLR ( i, j )>θ GLR θ RC-MES new, S GLR d GLR ( new, i )( i K) j( j K), d GLR ( new, j )< θ GLR, new j new S (6) new
: 63 S, θ RC-MES, S RC-MES, : () GLR new (2) new (3) ( λ i ) θ id new λ i (); new S (4) S (5) j, θ RC-MES j RC-MES j S (6) () 0 S 3, 8 2 ~5 ~20, 0 75% 6kHz,6 097 26, 2 Mel (Mel-frequency ceptru coefficient, MFCC) MFCC 30 0 RC-MES 6 GMM,,TIMIT [9] 00 (TIMIT,, ) RC-MES GLR θ GLR GLR ( ) ( ), / θ GLR 0, PRC(preciion) RCL(recall),PRC,RCL, : F PRC = (9) RCL = (0) [6] : 20* PRC * RCL F = PRC + RCL 2,,RCL PRC,, ()
64 Journal of Software Vol8, No3, March 2007 θ GLR,RCL PRC,PRC θ GLR 0 0,RCL 069 078PRC 067 055,PRC 074,,θ GLR =0 F 074θ GLR = 0 076, θ GLR = 0 Fig2 Recall 0 09 08 02 02 0 0 0 0 7 0 06 0 0 05 GLR baed After indexing 02 04 02 03 03 04 05 06 07 08 09 Preciion Recall-Preciion tradeoff of peaker change detection with different θ GLR 2 θ GLR Recall-Preciion 2 UBM UBM [4,5,] [0] [7,4], [2] UBM,UBM 52, 6 GMM, MAP UBM 40 RC-MES θ RC-MES 0,,, ( ) 3,,,(),UBM 30 ~5, UBM,, 30 RC-MES, ;, RC-MES, (<5 ), 4,,,,
: 65 RC-MES,, GLR RC_MES, 8,, 0 75%,,,,, NIST 00 80 Accuracy 60 40 Fig3 20 0 UBM adaptation Coon eigenvoice RC-MES baed 5 0 30 Average boottrapping egent duration Speaker indexing accuracy for variou boottrapping approache with different boottrapping egent duration 3 Reference: [] Delacourt P, Welleken CJ DISTBIC: A peaker-baed egentation for audio data indexing Speech Counication, 2000, 32(-2): 26 [2] Lu L, Zhang HJ Unupervied peaker egentation and tracking in real-tie audio content analyi Multiedia Syte, 2005, 0(4):332 343 [3] Sancho SS, Aceni n GA, Jo MLM, Carlo BC Offline peaker egentation uing genetic algorith and utual inforation IEEE Tran on Evolutionary Coputation, 2006,0(2):75 86 [4] Meignier S, Moraru D, Fredouille C, Bonatre JF, Beacier L Step-By-Step and integrated approache in broadcat new peaker diarization Coputer Speech and Language, 2006,20(-2):303 330 [5] Aronowitz H, Burhtein D, Air A Speaker indexing in audio archive uing Gauian ixture coring iulation In: Bengio S, Bourlard H, ed Proc of the t Int l Workhop on Machine Learning for Multiodal Interaction LNCS 336, Heidelberg: Springer-Verlag, 2005 243 252 [6] Anguera X, Wooter C, Pekin B, Aguilo M Robut peaker egentation for eeting: The ICSI-SRI pring diarization yte In: Renal S, Bengio S, ed Proc of the 2nd Int l Workhop on Machine Learning for Multiodal Interation LNCS 3869, Heidelberg: Springer-Verlag, 2005 402 44 [7] Capbell JP Speaker recognition: A tutorial Proc of the IEEE, 997,85(9):437 462 [8] Chen SS, Gopalakrihnan PS Clutering via the Bayeian inforation criterion with application in peech recognition In: Acero A, Hon HW, ed Proc of the 998 IEEE Int l Conf on Acoutic, Speech and Signal Proceing, vol2 Seattle, Wahington: IEEE, 998 645 648
66 Journal of Software Vol8, No3, March 2007 [9] Gih H, Schidt N Text-Independent peaker identification IEEE Signal Proceing Magazine, 994,(4):8 32 [0] Reynold DA, Quatieri TF, Dunn RB Speaker verification uing adapted Gauian ixture odel Digital Signal Proceing, 2000, 0:9 4 [] Kwon S, Narayanan S Unupervied peaker indexing uing generic odel IEEE Tran on Speech and Audio Proceing, 2005, 3(5):004 03 [2] Nihida M, Kawahara T Speaker odel election baed on the Bayeian inforation criterion applied to unupervied peaker indexing IEEE Tran on Speech and Audio Proceing, 2005,3(4):583 592 [3] Fu ZH, Zhao RC Speaker odeling technique baed on regreion cla for peaker identification with pare training In: Li SZ, et al ed Proc of the Sinobioetric 2004 LNCS 3338, Heidelberg: Springer-Verlag, 2004 60 66 [4] Kuhn R, Junqua JC, Niedzielki NP Rapid peaker adaptation in eigenvoice pace IEEE Tran on Speech and Audio Proceing, 2000,8(6):695 706 [5] Fu ZH Reearch on robutne of peaker recognition yte [PhD Thei] Xi an: Northwetern Polytechnique Univerity, 2004 (in Chinee with Englih abtract) [6] Ajera J, McCowan I, Bourland H Robut peaker change detection IEEE Signal Proceing Letter, 2004,(8):649 65 [7] Lu J, Mao B, Sun ZX, Zhang FY An iproved peaker baed peech egentation algorith Journal of Software, 2002,3(2): 274 279 (in Chinee with Englih abtract) http://wwwjoorgcn/000-9825/3/274pdf [8] Reynold DA, Roe RC Robut text-independent peaker identification uing Gauian ixture peaker odel IEEE Tran on Speech and Audio Proceing, 995,3():72 83 [9] Garofolo J, et al DARPA TIMIT acoutic-phonetic continuou peech corpu CD-ROM National Intitute of Standard and Technology, 993 : [5] [ ] :,2004 [7],,2002,3(2):274 279 http://wwwjoorgcn/ 000-9825/3/274pdf (977 ),,CCF, /,, (968 ),,