. i-vector, Total Variability Subspace Adaptation Based Speaker Recognition. Brief Paper ACTA AUTOMATICA SINICA Vol. 40, No. 8 August, 2014.

Σχετικά έγγραφα
A Robust Bootstrapping Algorithm of Speaker Models for On-Line Unsupervised Speaker Indexing

CSJ. Speaker clustering based on non-negative matrix factorization using i-vector-based speaker similarity

A method of power system harmonic detection based on wavelet transform

40 3 Journal of South China University of Technology Vol. 40 No Natural Science Edition March

Nov Journal of Zhengzhou University Engineering Science Vol. 36 No FCM. A doi /j. issn

ER-Tree (Extended R*-Tree)

Quick algorithm f or computing core attribute

CorV CVAC. CorV TU317. 1

Exact linearization control scheme of DFIG

The martingale pricing method for pricing fluctuation concerning stock models of callable bonds with random parameters

Study on the Strengthen Method of Masonry Structure by Steel Truss for Collapse Prevention

FENXI HUAXUE Chinese Journal of Analytical Chemistry. Savitzky-Golay. n = SG SG. Savitzky-Golay mmol /L 5700.

High order interpolation function for surface contact problem

Area Location and Recognition of Video Text Based on Depth Learning Method

No. 7 Modular Machine Tool & Automatic Manufacturing Technique. Jul TH166 TG659 A

Schedulability Analysis Algorithm for Timing Constraint Workflow Models

1 (forward modeling) 2 (data-driven modeling) e- Quest EnergyPlus DeST 1.1. {X t } ARMA. S.Sp. Pappas [4]

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

Ψηφιακή Επεξεργασία Φωνής

Motion analysis and simulation of a stratospheric airship

( ) , ) , ; kg 1) 80 % kg. Vol. 28,No. 1 Jan.,2006 RESOURCES SCIENCE : (2006) ,2 ,,,, ; ;

Journal of Central South University (Science and Technology) May Bragg TU443 A (2011)

Detection and Recognition of Traffic Signal Using Machine Learning

ΔΙΠΛΩΜΑΤΙΚΕΣ ΕΡΓΑΣΙΕΣ

SVM. Research on ERPs feature extraction and classification

Research on explaining porosity in carbonate reservoir by capture cross section method

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

Gain self-tuning of PI controller and parameter optimum for PMSM drives

ON DEFINITIONS OF SAFETY FACTOR OF SLOPE STABILITY ANALYSIS WITH FINITE ELEMENT METHOD

MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review

A research on the influence of dummy activity on float in an AOA network and its amendments

Study of In-vehicle Sound Field Creation by Simultaneous Equation Method

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

Buried Markov Model Pairwise

LUO, Hong2Qun LIU, Shao2Pu Ξ LI, Nian2Bing


Reading Order Detection for Text Layout Excluded by Image

Supporting Information. Research Center for Marine Drugs, Department of Pharmacy, State Key Laboratory

21 2 TRANSACTIONS OF CHINA ELECTROTECHNICAL SOCIETY Feb TM464

Estimation of stability region for a class of switched linear systems with multiple equilibrium points

Adaptive grouping difference variation wolf pack algorithm

Vol. 31,No JOURNAL OF CHINA UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb

Evaluation on precision of occurrence measurement based on theory of errors

1530 ( ) 2014,54(12),, E (, 1, X ) [4],,, α, T α, β,, T β, c, P(T β 1 T α,α, β,c) 1 1,,X X F, X E F X E X F X F E X E 1 [1-2] , 2 : X X 1 X 2 ;

The optimization of EV powertrain s efficiency control strategy under dynamic operation condition

Ερευνητική+Ομάδα+Τεχνολογιών+ Διαδικτύου+

*,* + -+ on Bedrock Bath. Hideyuki O, Shoichi O, Takao O, Kumiko Y, Yoshinao K and Tsuneaki G

Fragility analysis for control systems

Automatic extraction of bibliography with machine learning

Πτυχιακή Εργασι α «Εκτι μήσή τής ποιο τήτας εικο νων με τήν χρή σή τεχνήτων νευρωνικων δικτυ ων»

, Litrrow. Maxwell. Helmholtz Fredholm, . 40 Maystre [4 ], Goray [5 ], Kleemann [6 ] PACC: 4210, 4110H

Research of Han Character Internal Codes Recognition Algorithm in the Multi2lingual Environment

Estimation, Evaluation and Guarantee of the Reverberant Speech Recognition Performance based on Room Acoustic Parameters

ΣΤΟΙΧΕΙΑ ΠΡΟΤΕΙΝΟΜΕΝΟΥ ΕΞΩΤΕΡΙΚΟΥ ΕΜΠΕΙΡΟΓΝΩΜΟΝΟΣ Προσωπικά Στοιχεία:

Main source: "Discrete-time systems and computer control" by Α. ΣΚΟΔΡΑΣ ΨΗΦΙΑΚΟΣ ΕΛΕΓΧΟΣ ΔΙΑΛΕΞΗ 4 ΔΙΑΦΑΝΕΙΑ 1

[1] DNA ATM [2] c 2013 Information Processing Society of Japan. Gait motion descriptors. Osaka University 2. Drexel University a)

The Algorithm to Extract Characteristic Chord Progression Extended the Sequential Pattern Mining

Research on real-time inverse kinematics algorithms for 6R robots

Electronic Supplementary Information (ESI)

{takasu, Conditional Random Field

) ; GSP ) ;PXD g, 100 ml

Feasible Regions Defined by Stability Constraints Based on the Argument Principle

JOURNAL OF APPLIED SCIENCES Electronics and Information Engineering TP (2011)

Studies on the Binding Mechanism of Several Antibiotics and Human Serum Albumin

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

1 h, , CaCl 2. pelamis) 58.1%, (Headspace solid -phase microextraction and gas chromatography -mass spectrometry,hs -SPME - Vol. 15 No.

IF(Ingerchange Format) [7] IF C-STAR(Consortium for speech translation advanced research ) [8] IF 2 IF

ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ ΣΧΟΛΗ ΗΛΕΚΤΡΟΛΟΓΩΝ ΜΗΧΑΝΙΚΩΝ ΚΑΙ ΜΗΧΑΝΙΚΩΝ ΥΠΟΛΟΓΙΣΤΩΝ

1 n-gram n-gram n-gram [11], [15] n-best [16] n-gram. n-gram. 1,a) Graham Neubig 1,b) Sakriani Sakti 1,c) 1,d) 1,e)

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

Intelligent Prediction Method for Small2Batch Producing Quality based on Fuzzy Least Square SVM

Analysis of energy consumption of telecommunications network and application of energy-saving techniques

STABILITY OF ABERRATION RETRIEVAL METHOD USING SPOT IMAGES

Control Theory & Applications PID (, )

Πανεπιστήμιο Πειραιώς Τμήμα Πληροφορικής Πρόγραμμα Μεταπτυχιακών Σπουδών «Πληροφορική»

EM Baum-Welch. Step by Step the Baum-Welch Algorithm and its Application 2. HMM Baum-Welch. Baum-Welch. Baum-Welch Baum-Welch.

2002 Journal of Software

Re-Pair n. Re-Pair. Re-Pair. Re-Pair. Re-Pair. (Re-Merge) Re-Merge. Sekine [4, 5, 8] (highly repetitive text) [2] Re-Pair. Blocked-Repair-VF [7]

Optimizing Microwave-assisted Extraction Process for Paprika Red Pigments Using Response Surface Methodology

Analysis on construction application of lager diameter pile foundation engineering in Guangdong coastal areas

Stress Relaxation Test and Constitutive Equation of Saturated Soft Soil

Computational study of the structure, UV-vis absorption spectra and conductivity of biphenylene-based polymers and their boron nitride analogues

Study on Re-adhesion control by monitoring excessive angular momentum in electric railway traction

Applying Markov Decision Processes to Role-playing Game

Supporting Information

Supporting Information

Automatic Domain2Specific Term Extraction and Its Application in Text Cla ssification

2 PbO 2. Pb 3 O 4 Sn. Ti/SnO 2 -Sb 2 O 4 -CF/PbO x SnO 2 -Sb PbO 2. Sn-Sb 1:1. 1 h. Sn:Sb=10:1. PbO 2 - CeO 2 PbO 2. [8] SnO 2 +Sb 2 O 4 _

J. of Math. (PRC) 6 n (nt ) + n V = 0, (1.1) n t + div. div(n T ) = n τ (T L(x) T ), (1.2) n)xx (nt ) x + nv x = J 0, (1.4) n. 6 n

SocialDict. A reading support tool with prediction capability and its extension to readability measurement

Octretide joint proton pump inhibitors in treating non-variceal gastrointestinal bleeding a Metaanalysis

Approximation of distance between locations on earth given by latitude and longitude

Emulation system of the asynchronous push-broom remote sensing stereo imaging

n 1 n 3 choice node (shelf) choice node (rough group) choice node (representative candidate)

2 ~ 8 Hz Hz. Blondet 1 Trombetti 2-4 Symans 5. = - M p. M p. s 2 x p. s 2 x t x t. + C p. sx p. + K p. x p. C p. s 2. x tp x t.

A summation formula ramified with hypergeometric function and involving recurrence relation

Development of the Nursing Program for Rehabilitation of Woman Diagnosed with Breast Cancer

Supplementary Materials for Evolutionary Multiobjective Optimization Based Multimodal Optimization: Fitness Landscape Approximation and Peak Detection

[4] 1.2 [5] Bayesian Approach min-max min-max [6] UCB(Upper Confidence Bound ) UCT [7] [1] ( ) Amazons[8] Lines of Action(LOA)[4] Winands [4] 1

Transcript:

40 8 2014 8 Brief Paper ACTA AUTOMATICA SINICA Vol. 40, No. 8 Augut, 2014 i-vector 1 1 1 1, (identity vector, i-vector), T i-vector T, (American National Intitute of Standard and Technology, NIST) 2008,,,,,,,,,, i-vector, 2014, 40(8): 1836 1840 DOI 10.3724/SP.J.1004.2014.01836 Total Variability Subpace Adaptation Baed Speaker Recognition LI Zhi-Yi 1 ZHANG Wei-Qiang 1 HE Liang 1 LIU Jia 1 Abtract In text-independent peaker recognition, the identity vector (i-vector) baed modeling method ha recently been proved to be the mot popular and efficient method. It i a key problem to etimate the total variability ubpace T efficiently and accurately. In thi paper, two adaptation algorithm are propoed in order to improve the performance of the i-vector bae ytem in practical environment. Experiment on the 2008 core peaker recognition evaluation dataet of American NIST and Technology and the elf-collected peaker recognition evaluation dataet demontrate that uing the propoed adaptation algorithm to adapt to the total variability ubpace T from either the tet dataet or the developing dataet i effective for improving the performance. In addition, the combination of the two adaptation algorithm can achieve almot the bet performance uing the developing dataet rather than the tet dataet. Key word i-vector, total variability ubpace, adaptation, peaker recognition Citation Li Zhi-Yi, Zhang Wei-Qiang, He Liang, Liu Jia. Total variability ubpace adaptation baed peaker recognition. Acta Automatica Sinica, 2014, 40(8): 1836 1840 2013-11-13 2013-11-23 Manucript received November 13, 2013; accepted November 23, 2013 Recommended by Aociate Editor WU Xi-Hong (61370034, 61273268, 61005019, 90920302), (KZ201110005005), [1]., i-vector, [2 3], (American National Intitute of Standard and Technology, NIST), - (Gauian mixture model uper vector-upport vector machine, GSV-SVM) [4] (Joint factor analyi, JFA) [5 6], i-vector GSV-SVM JFA, - (Gauian mixture model-univeral background model, GMM-UBM) [7],, (1) M = m + T w (1), M, m T,,, T,,, i-vector, i-vector (Linear dicriminate analyi, LDA) (Within cla covariance normalization, WCCN). LDA i-vector, WCCN, LDA WCCN i-vector, i-vector (Coine ditance coring, CDS) SVM [8], [2] CDS i-vector, T. T i-vector, NIST, i-vector GSV-SVM, NIST,,, i-vector, Supported by National Natural Science Foundation of China (61370034, 61273268, 61005019, 90920302) and Beijing Natural Science Foundation (KZ201110005005) 1. 100084 1. Tinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering, Tinghua Univerity, Beijing 100084

8 : i-vector 1837,,, [9], i-vector GSV-SVM i-vector T, i-vector,, : 1 i-vector T ; 2 i-vector ; 3 T, ; 4 ; 5 1 T 1.1 i-vector T, UBM,, (Expectation maximum, EM) UBM,, UBM (Maximum a poterior, MAP), x,t, UBM m N c,, F c, S c, (2) N c, = t F c, = t γ c,,t S c, = diag{ t γ c,,t(x,t m c) γ c,,t(x,t m c)(x,t m c) T } (2) m c UBM m c. t γ c,,t UBM c diag{ } F, C F C. 1.2 T, T (Expectation maximum, EM), T, T, w, (3). F F c, F C 1. N N c, F C F C. L = I + T T Σ 1 N T E[w ] = L 1 T T Σ 1 F E[w w T ] = E[w ]E[w T ] + L 1 (3) L, Σ UBM T Σ. T (4), [10] N T E[w w T ] = F E[w ] (4) UBM Σ (5) Σ = N 1 S N 1 diag{ F E[w T ]T T } (5) S S c, F C F C, N = N 6 8, T Σ 2 i-vector i-vector [2], LDA WCCN i-vector, i-vector 2.1 [11] (Linear dicriminant analyi, LDA) i-vector, LDA i-vector LDA (6), J(w) = wt S Bw w T S W w (6) S B S W (7) (8) S B = S W = (6) (w w)(w w) T (7) =1 1 n n =1 i=1 (w i w )(w i w ) T (8) w = (1/n n ) i=1 w i i-vector S, n i-vector (6) (9) 2.2 S Bw = λs W w (9) [12] (Within cla covariance normalization, WCCN) WCCN (10) W = 1 S 1 n n =1 i=1 (w i w )(w i w ) T (10)

1838 40 w = (1/n ) n i=1 w i i-vector S, n i-vector 2.3 [2], i-vector i-vector i-vector, w tar w tt, θ,, (11) core(w tar, w tt) = w tar, w tt θ (11) w tar w tt,,,,, 3 T JFA, i-vector T, T, T,,, 3.1 -, 1.2 T o,, T o,, 1 2. T, L, w E[w ] E[w w T ]; 3. 2 T, (4) ; 4. 2 3 UBM Σ, (5) ; 5. 2 ; 3.2 [13 14],,, i-vector,, 2 Fig. 2 T Diagram of total variability ubpace T combination adaptation algorithm 2, 1.2 T o T n,, : 2. T 1. T o; 2. T n; 3. T o T n T. 3.3,,, 3 Fig. 1 1 T Total variability diagram of total variability ubpace T iteration adaptation algorithm, T o,, EM 1, 1 1. T 1. T o UBM Σ, ; 3 Fig. 3 T Diagram of total variability ubpace T integration algorithm of iteration adaptation and ubpace combination adaptation

8 : i-vector 1839 3, 1.2 T o, 1 T n,, 3. T 1. T o; 2. 1, T n; 3. T o T n T. 3.4,,,,, 4, i-vector,,,,,,,, 4.1, NIST SRE 2 008, Switchboard I II 20 000, UBM, ZTnorm LDA WCCN, 12 922, 3 000, 20 000, UBM, ZTnorm LDA WCCN, 8 000, 2 000 Mel (Mel-frequency ceptral coefficient, MFCC) G.723.1 (Voice activity detection, VAD) (Ceptral mean ubtraction, CMS), 3 (Feature warping), 25 % ( 0.95)., 13, 39 MFCC UBM 1 024, i-vector T 400, 6. LDA 200. 4.2 (Equal error rate, EER) (Minimum detection cot function, MinDCF). 1 2 T T,,,,,, T,, 1 T T NIST SRE 2 008 Table 1 algorithm and the propoed iteration adaptation T algorithm 2 on NIST SRE 2 008 core dataet T 5.41 0.029 T 4.92 0.026 T 4.67 0.023 T T Table 2 algorithm and the propoed iteration adaptation T algorithm on actual application dataet T 3.00 0.014 T 2.99 0.013 T 2.00 0.011 3 T NIST SRE 2 008 Table 3 algorithm and the propoed integration algorithm of iteration adaptation and ubpace combination adaptation on NIST SRE 2008 core dataet T 5.41 0.029 T 4.01 0.021 T 3.89 0.020 1 2, 3.3 3 4,,,

1840 40 4 Table 4 T algorithm and the propoed integration algorithm of iteration adaptation and ubpace combination adaptation on actual application dataet T 3.00 0.014 T 1.99 0.012 T 1.99 0.010,,,,, i-vector 5 i-vector, T,, i-vector T,,. NIST SRE 2008,,,,, i-vector Reference 1 Kinnunen T, Li H Z. An overview of text-independent peaker recognition: from feature to upervector. Speech Communication, 2010, 52(1): 12 40 2 Dehak N, Kenny P, Ouellet P, Dumouchel P. Front-end factor analyi for peaker verification. IEEE Tranaction on Audio, Speech and Language Proceing, 2011, 19(4): 788 798 3 Li Zhi-Yi, He Liang, Zhang Wei-Qiang, Liu Jia. Speaker recognition baed on dicriminant i-vector local ditance preerving projection. Journal of Tinghua Univerity (Science and Technology), 2012, 52(5): 598 601 (,,, i-vector ( ), 2012, 52(5): 598 601) 4 Campbell W M, Campbell J P, Reynold D A, Singer E, Torre-Carraquillo P A. Support vector machine for peaker and language recognition. Computer Speech and Language, 2006, 20(2 3): 210 229 5 Kenny P, Boulianne G, Ouellet P, Dumouchel P. Speaker and eion variability in GMM-baed peaker verification. IEEE Tranaction on Audio, Speech and Language Proceing, 2007, 15(4): 1448 1460 6 Kenny P, Boulianne G, Ouellet P, Dumouchel P. Joint factor analyi veru eigenchannel in peaker recognition. IEEE Tranaction on Audio, Speech and Language Proceing, 2007, 15(4): 1435 1447 7 Reynold D A, Quatieri T F, Dunn R B. Speaker verification uing adapted Gauian mixture model. Digital Signal Proceing, 2000, 10(1 3): 19 41 8 Corte C, Vapnik V. Support vector network. Machine Learning, 1995, 20(3): 273 297 9 Zhang Wen-Lin, Zhang Wei-Qiang, Liu Jia, Li Bi-Cheng, Qu Dan. A new ubpace baed peaker adaptation method. Acta Automatica Sinica, 2011, 37(12): 1495 1502 (,,,,, 2011, 37(12): 1495 1502) 10 Kenny P, Boulianne G, Dumouchel P. Eigenvoice modeling with pare training data. IEEE Tranaction on Audio, Speech, and Language Proceing, 2005, 13(3): 345 354 11 Bihop C M. Pattern Recognition and Machine Learning. Berlin: Springer, 2008 12 Hatch A O, Kajarekar S, Stolcke A. Within-cla covariance normalization for SVM-baed peaker recognition. In: Proceeding of the International Conference on Spoken Language Proceing. Pittburgh, PA, 2006. 1471 1474 13 He Liang, Shi Yong-Zhe, Liu Jia. Eigenchannel pace combination method of joint factor analyi Acta Automatica Sinica, 2011, 37(7): 849 856 (,,, 2011, 37(7): 849 856) 14 Guo Wu, Li Yi-Jie, Dai Li-Rong, Wang Ren-Hua. Factor analyi and pace aembling in peaker recognition. Acta Automatica Sinica, 2009, 35(9): 1193 1198 (,,,, 2009, 35(9): 1193 1198) E-mail: lizhiyi06@mail.tinghua.edu.cn (LI Zhi-Yi Ph. D. candidate in the Department of Electronic Engineering, Tinghua Univerity. Hi reearch interet cover peaker recognition and language recognition. Correponding author of thi paper.) E-mail: wqzhang@tinghua.edu.cn (ZHANG Wei-Qiang Aitant profeor in the Department of Electronic Engineering, Tinghua Univerity. Hi reearch interet cover peaker recognition and language recognition.) E-mail: heliang@tinghua.edu.cn (HE Liang Aitant profeor in the Department of Electronic Engineering, Tinghua Univerity. Hi reearch interet cover peaker recognition and language recognition.). E-mail: liuj@tinghua.edu.cn (LIU Jia Profeor in the Department of Electronic Engineering, Tinghua Univerity. Hi reearch interet cover peech recognition and ignal proceing.)