. i-vector, Total Variability Subspace Adaptation Based Speaker Recognition. Brief Paper ACTA AUTOMATICA SINICA Vol. 40, No. 8 August, 2014.

40 8 2014 8 Brief Paper ACTA AUTOMATICA SINICA Vol. 40, No. 8 Augut, 2014 i-vector 1 1 1 1, (identity vector, i-vector), T i-vector T, (American National Intitute of Standard and Technology, NIST) 2008,,,,,,,,,, i-vector, 2014, 40(8): 1836 1840 DOI 10.3724/SP.J.1004.2014.01836 Total Variability Subpace Adaptation Baed Speaker Recognition LI Zhi-Yi 1 ZHANG Wei-Qiang 1 HE Liang 1 LIU Jia 1 Abtract In text-independent peaker recognition, the identity vector (i-vector) baed modeling method ha recently been proved to be the mot popular and efficient method. It i a key problem to etimate the total variability ubpace T efficiently and accurately. In thi paper, two adaptation algorithm are propoed in order to improve the performance of the i-vector bae ytem in practical environment. Experiment on the 2008 core peaker recognition evaluation dataet of American NIST and Technology and the elf-collected peaker recognition evaluation dataet demontrate that uing the propoed adaptation algorithm to adapt to the total variability ubpace T from either the tet dataet or the developing dataet i effective for improving the performance. In addition, the combination of the two adaptation algorithm can achieve almot the bet performance uing the developing dataet rather than the tet dataet. Key word i-vector, total variability ubpace, adaptation, peaker recognition Citation Li Zhi-Yi, Zhang Wei-Qiang, He Liang, Liu Jia. Total variability ubpace adaptation baed peaker recognition. Acta Automatica Sinica, 2014, 40(8): 1836 1840 2013-11-13 2013-11-23 Manucript received November 13, 2013; accepted November 23, 2013 Recommended by Aociate Editor WU Xi-Hong (61370034, 61273268, 61005019, 90920302), (KZ201110005005), [1]., i-vector, [2 3], (American National Intitute of Standard and Technology, NIST), - (Gauian mixture model uper vector-upport vector machine, GSV-SVM) [4] (Joint factor analyi, JFA) [5 6], i-vector GSV-SVM JFA, - (Gauian mixture model-univeral background model, GMM-UBM) [7],, (1) M = m + T w (1), M, m T,,, T,,, i-vector, i-vector (Linear dicriminate analyi, LDA) (Within cla covariance normalization, WCCN). LDA i-vector, WCCN, LDA WCCN i-vector, i-vector (Coine ditance coring, CDS) SVM [8], [2] CDS i-vector, T. T i-vector, NIST, i-vector GSV-SVM, NIST,,, i-vector, Supported by National Natural Science Foundation of China (61370034, 61273268, 61005019, 90920302) and Beijing Natural Science Foundation (KZ201110005005) 1. 100084 1. Tinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering, Tinghua Univerity, Beijing 100084

8 : i-vector 1837,,, [9], i-vector GSV-SVM i-vector T, i-vector,, : 1 i-vector T ; 2 i-vector ; 3 T, ; 4 ; 5 1 T 1.1 i-vector T, UBM,, (Expectation maximum, EM) UBM,, UBM (Maximum a poterior, MAP), x,t, UBM m N c,, F c, S c, (2) N c, = t F c, = t γ c,,t S c, = diag{ t γ c,,t(x,t m c) γ c,,t(x,t m c)(x,t m c) T } (2) m c UBM m c. t γ c,,t UBM c diag{ } F, C F C. 1.2 T, T (Expectation maximum, EM), T, T, w, (3). F F c, F C 1. N N c, F C F C. L = I + T T Σ 1 N T E[w ] = L 1 T T Σ 1 F E[w w T ] = E[w ]E[w T ] + L 1 (3) L, Σ UBM T Σ. T (4), [10] N T E[w w T ] = F E[w ] (4) UBM Σ (5) Σ = N 1 S N 1 diag{ F E[w T ]T T } (5) S S c, F C F C, N = N 6 8, T Σ 2 i-vector i-vector [2], LDA WCCN i-vector, i-vector 2.1 [11] (Linear dicriminant analyi, LDA) i-vector, LDA i-vector LDA (6), J(w) = wt S Bw w T S W w (6) S B S W (7) (8) S B = S W = (6) (w w)(w w) T (7) =1 1 n n =1 i=1 (w i w )(w i w ) T (8) w = (1/n n ) i=1 w i i-vector S, n i-vector (6) (9) 2.2 S Bw = λs W w (9) [12] (Within cla covariance normalization, WCCN) WCCN (10) W = 1 S 1 n n =1 i=1 (w i w )(w i w ) T (10)

1838 40 w = (1/n ) n i=1 w i i-vector S, n i-vector 2.3 [2], i-vector i-vector i-vector, w tar w tt, θ,, (11) core(w tar, w tt) = w tar, w tt θ (11) w tar w tt,,,,, 3 T JFA, i-vector T, T, T,,, 3.1 -, 1.2 T o,, T o,, 1 2. T, L, w E[w ] E[w w T ]; 3. 2 T, (4) ; 4. 2 3 UBM Σ, (5) ; 5. 2 ; 3.2 [13 14],,, i-vector,, 2 Fig. 2 T Diagram of total variability ubpace T combination adaptation algorithm 2, 1.2 T o T n,, : 2. T 1. T o; 2. T n; 3. T o T n T. 3.3,,, 3 Fig. 1 1 T Total variability diagram of total variability ubpace T iteration adaptation algorithm, T o,, EM 1, 1 1. T 1. T o UBM Σ, ; 3 Fig. 3 T Diagram of total variability ubpace T integration algorithm of iteration adaptation and ubpace combination adaptation

8 : i-vector 1839 3, 1.2 T o, 1 T n,, 3. T 1. T o; 2. 1, T n; 3. T o T n T. 3.4,,,,, 4, i-vector,,,,,,,, 4.1, NIST SRE 2 008, Switchboard I II 20 000, UBM, ZTnorm LDA WCCN, 12 922, 3 000, 20 000, UBM, ZTnorm LDA WCCN, 8 000, 2 000 Mel (Mel-frequency ceptral coefficient, MFCC) G.723.1 (Voice activity detection, VAD) (Ceptral mean ubtraction, CMS), 3 (Feature warping), 25 % ( 0.95)., 13, 39 MFCC UBM 1 024, i-vector T 400, 6. LDA 200. 4.2 (Equal error rate, EER) (Minimum detection cot function, MinDCF). 1 2 T T,,,,,, T,, 1 T T NIST SRE 2 008 Table 1 algorithm and the propoed iteration adaptation T algorithm 2 on NIST SRE 2 008 core dataet T 5.41 0.029 T 4.92 0.026 T 4.67 0.023 T T Table 2 algorithm and the propoed iteration adaptation T algorithm on actual application dataet T 3.00 0.014 T 2.99 0.013 T 2.00 0.011 3 T NIST SRE 2 008 Table 3 algorithm and the propoed integration algorithm of iteration adaptation and ubpace combination adaptation on NIST SRE 2008 core dataet T 5.41 0.029 T 4.01 0.021 T 3.89 0.020 1 2, 3.3 3 4,,,

1840 40 4 Table 4 T algorithm and the propoed integration algorithm of iteration adaptation and ubpace combination adaptation on actual application dataet T 3.00 0.014 T 1.99 0.012 T 1.99 0.010,,,,, i-vector 5 i-vector, T,, i-vector T,,. NIST SRE 2008,,,,, i-vector Reference 1 Kinnunen T, Li H Z. An overview of text-independent peaker recognition: from feature to upervector. Speech Communication, 2010, 52(1): 12 40 2 Dehak N, Kenny P, Ouellet P, Dumouchel P. Front-end factor analyi for peaker verification. IEEE Tranaction on Audio, Speech and Language Proceing, 2011, 19(4): 788 798 3 Li Zhi-Yi, He Liang, Zhang Wei-Qiang, Liu Jia. Speaker recognition baed on dicriminant i-vector local ditance preerving projection. Journal of Tinghua Univerity (Science and Technology), 2012, 52(5): 598 601 (,,, i-vector ( ), 2012, 52(5): 598 601) 4 Campbell W M, Campbell J P, Reynold D A, Singer E, Torre-Carraquillo P A. Support vector machine for peaker and language recognition. Computer Speech and Language, 2006, 20(2 3): 210 229 5 Kenny P, Boulianne G, Ouellet P, Dumouchel P. Speaker and eion variability in GMM-baed peaker verification. IEEE Tranaction on Audio, Speech and Language Proceing, 2007, 15(4): 1448 1460 6 Kenny P, Boulianne G, Ouellet P, Dumouchel P. Joint factor analyi veru eigenchannel in peaker recognition. IEEE Tranaction on Audio, Speech and Language Proceing, 2007, 15(4): 1435 1447 7 Reynold D A, Quatieri T F, Dunn R B. Speaker verification uing adapted Gauian mixture model. Digital Signal Proceing, 2000, 10(1 3): 19 41 8 Corte C, Vapnik V. Support vector network. Machine Learning, 1995, 20(3): 273 297 9 Zhang Wen-Lin, Zhang Wei-Qiang, Liu Jia, Li Bi-Cheng, Qu Dan. A new ubpace baed peaker adaptation method. Acta Automatica Sinica, 2011, 37(12): 1495 1502 (,,,,, 2011, 37(12): 1495 1502) 10 Kenny P, Boulianne G, Dumouchel P. Eigenvoice modeling with pare training data. IEEE Tranaction on Audio, Speech, and Language Proceing, 2005, 13(3): 345 354 11 Bihop C M. Pattern Recognition and Machine Learning. Berlin: Springer, 2008 12 Hatch A O, Kajarekar S, Stolcke A. Within-cla covariance normalization for SVM-baed peaker recognition. In: Proceeding of the International Conference on Spoken Language Proceing. Pittburgh, PA, 2006. 1471 1474 13 He Liang, Shi Yong-Zhe, Liu Jia. Eigenchannel pace combination method of joint factor analyi Acta Automatica Sinica, 2011, 37(7): 849 856 (,,, 2011, 37(7): 849 856) 14 Guo Wu, Li Yi-Jie, Dai Li-Rong, Wang Ren-Hua. Factor analyi and pace aembling in peaker recognition. Acta Automatica Sinica, 2009, 35(9): 1193 1198 (,,,, 2009, 35(9): 1193 1198) E-mail: lizhiyi06@mail.tinghua.edu.cn (LI Zhi-Yi Ph. D. candidate in the Department of Electronic Engineering, Tinghua Univerity. Hi reearch interet cover peaker recognition and language recognition. Correponding author of thi paper.) E-mail: wqzhang@tinghua.edu.cn (ZHANG Wei-Qiang Aitant profeor in the Department of Electronic Engineering, Tinghua Univerity. Hi reearch interet cover peaker recognition and language recognition.) E-mail: heliang@tinghua.edu.cn (HE Liang Aitant profeor in the Department of Electronic Engineering, Tinghua Univerity. Hi reearch interet cover peaker recognition and language recognition.). E-mail: liuj@tinghua.edu.cn (LIU Jia Profeor in the Department of Electronic Engineering, Tinghua Univerity. Hi reearch interet cover peech recognition and ignal proceing.)