A Robust Bootstrapping Algorithm of Speaker Models for On-Line Unsupervised Speaker Indexing

Σχετικά έγγραφα
. i-vector, Total Variability Subspace Adaptation Based Speaker Recognition. Brief Paper ACTA AUTOMATICA SINICA Vol. 40, No. 8 August, 2014.

CSJ. Speaker clustering based on non-negative matrix factorization using i-vector-based speaker similarity

ER-Tree (Extended R*-Tree)

Application of Genetic Algorithm in Architectural Conceptual Design

Voice Conversion based on Non-negative Matrix Factorization with Segment Features in Noisy Environments

A method of power system harmonic detection based on wavelet transform

Nov Journal of Zhengzhou University Engineering Science Vol. 36 No FCM. A doi /j. issn

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

CorV CVAC. CorV TU317. 1

[9, 10] [2] [4] [10] Kyoto University, Yoshida Honmachi, Sakyo, Kyoto, , Japan 2. Yamaha Corporation. Waseda University a)

TP A.20 The effect of spin, speed, and cut angle on draw shots

40 3 Journal of South China University of Technology Vol. 40 No Natural Science Edition March

No. 7 Modular Machine Tool & Automatic Manufacturing Technique. Jul TH166 TG659 A

Estimation, Evaluation and Guarantee of the Reverberant Speech Recognition Performance based on Room Acoustic Parameters

Vol. 31,No JOURNAL OF CHINA UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb

Adaptive grouping difference variation wolf pack algorithm

Quick algorithm f or computing core attribute

Solving an Air Conditioning System Problem in an Embodiment Design Context Using Constraint Satisfaction Techniques

1 (forward modeling) 2 (data-driven modeling) e- Quest EnergyPlus DeST 1.1. {X t } ARMA. S.Sp. Pappas [4]

A Fault Identification Algorithm for Satellite Networks Based on System Level Diagnosis

Reading Order Detection for Text Layout Excluded by Image


1181 (real-timespeechdriven) 1 1 ( ) D FAP FAP (voiceactivationdetectionvad) D FaceGen 3- D XfaceEd MPEG-4 1 FAP 66 FAP ( ) FAP 84

Cost-Sensitive Margin Distribution Optimization for Software Bug Localization

, Litrrow. Maxwell. Helmholtz Fredholm, . 40 Maystre [4 ], Goray [5 ], Kleemann [6 ] PACC: 4210, 4110H

EM Baum-Welch. Step by Step the Baum-Welch Algorithm and its Application 2. HMM Baum-Welch. Baum-Welch. Baum-Welch Baum-Welch.

Fourier transform, STFT 5. Continuous wavelet transform, CWT STFT STFT STFT STFT [1] CWT CWT CWT STFT [2 5] CWT STFT STFT CWT CWT. Griffin [8] CWT CWT

Stabilization of stock price prediction by cross entropy optimization

2 ~ 8 Hz Hz. Blondet 1 Trombetti 2-4 Symans 5. = - M p. M p. s 2 x p. s 2 x t x t. + C p. sx p. + K p. x p. C p. s 2. x tp x t.

* ** *** *** Jun S HIMADA*, Kyoko O HSUMI**, Kazuhiko O HBA*** and Atsushi M ARUYAMA***

Q L -BFGS. Method of Q through full waveform inversion based on L -BFGS algorithm. SUN Hui-qiu HAN Li-guo XU Yang-yang GAO Han ZHOU Yan ZHANG Pan

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

[5] F 16.1% MFCC NMF D-CASE 17 [5] NMF NMF 3. [5] 1 NMF Deep Neural Network(DNN) FUSION 3.1 NMF NMF [12] S W H 1 Fig. 1 Our aoustic event detect

1,a) 1,b) 2 3 Sakriani Sakti 1 Graham Neubig 1 1. A Study on HMM-Based Speech Synthesis Using Rich Context Models

Exact linearization control scheme of DFIG

Detection and Recognition of Traffic Signal Using Machine Learning

Congruence Classes of Invertible Matrices of Order 3 over F 2

Ψηφιακή Επεξεργασία Φωνής

Re-Pair n. Re-Pair. Re-Pair. Re-Pair. Re-Pair. (Re-Merge) Re-Merge. Sekine [4, 5, 8] (highly repetitive text) [2] Re-Pair. Blocked-Repair-VF [7]

Research of Han Character Internal Codes Recognition Algorithm in the Multi2lingual Environment

Applying Markov Decision Processes to Role-playing Game

Feasible Regions Defined by Stability Constraints Based on the Argument Principle

Resurvey of Possible Seismic Fissures in the Old-Edo River in Tokyo

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

The Research on Sampling Estimation of Seasonal Index Based on Stratified Random Sampling

Schedulability Analysis Algorithm for Timing Constraint Workflow Models

(Υπογραϕή) (Υπογραϕή) (Υπογραϕή)

Buried Markov Model Pairwise

C F E E E F FF E F B F F A EA C AEC

Vol. 37 ( 2017 ) No. 3. J. of Math. (PRC) : A : (2017) k=1. ,, f. f + u = f φ, x 1. x n : ( ).

Συνδυασμένη Οπτική-Ακουστική Ανάλυση Ομιλίας

2002 Journal of Software

GMRES(m) , GMRES, , GMRES(m), Look-Back GMRES(m). Ax = b, A C n n, x, b C n (1) Krylov.

2. N-gram IDF. DEIM Forum 2016 A1-1. N-gram IDF IDF. 5 N-gram. N-gram. N-gram. N-gram IDF.

The martingale pricing method for pricing fluctuation concerning stock models of callable bonds with random parameters

OLS. University of New South Wales, Australia

CE 530 Molecular Simulation

Siemens AG Rated current 1FK7 Compact synchronous motor Natural cooling. I rated 7.0 (15.4) 11.5 (25.4) (2.9) 3.3 (4.4)

J. of Math. (PRC) Banach, , X = N(T ) R(T + ), Y = R(T ) N(T + ). Vol. 37 ( 2017 ) No. 5

Discriminantal arrangement

Principles of Workflow in Data Analysis

Quantum dot sensitized solar cells with efficiency over 12% based on tetraethyl orthosilicate additive in polysulfide electrolyte

A research on the influence of dummy activity on float in an AOA network and its amendments

2 PbO 2. Pb 3 O 4 Sn. Ti/SnO 2 -Sb 2 O 4 -CF/PbO x SnO 2 -Sb PbO 2. Sn-Sb 1:1. 1 h. Sn:Sb=10:1. PbO 2 - CeO 2 PbO 2. [8] SnO 2 +Sb 2 O 4 _

1530 ( ) 2014,54(12),, E (, 1, X ) [4],,, α, T α, β,, T β, c, P(T β 1 T α,α, β,c) 1 1,,X X F, X E F X E X F X F E X E 1 [1-2] , 2 : X X 1 X 2 ;

2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems

Dr. D. Dinev, Department of Structural Mechanics, UACEG

Solar Neutrinos: Fluxes

Statistical analysis of extreme events in a nonstationary context via a Bayesian framework. Case study with peak-over-threshold data

Vol.4-DCC-8 No.8 Vol.4-MUS-5 No.8 4// 3 3 Hanning (T ) 3 Hanning 3T (y(t)w(t)) dt =.5 T y (t)dt. () STRAIGHT F 3 TANDEM-STRAIGHT[] 3 F F 3 [] F []. :

Motion analysis and simulation of a stratospheric airship

Anomaly Detection with Neighborhood Preservation Principle

Dynamic types, Lambda calculus machines Section and Practice Problems Apr 21 22, 2016

CAP A CAP

LUO, Hong2Qun LIU, Shao2Pu Ξ LI, Nian2Bing

Apr Vol.26 No.2. Pure and Applied Mathematics O157.5 A (2010) (d(u)d(v)) α, 1, (1969-),,.

21 2 TRANSACTIONS OF CHINA ELECTROTECHNICAL SOCIETY Feb TM464

The toxicity of three chitin synthesis inhibitors to Calliptamus italicus Othoptera Acridoidea

ON DEFINITIONS OF SAFETY FACTOR OF SLOPE STABILITY ANALYSIS WITH FINITE ELEMENT METHOD

{takasu, Conditional Random Field

Echo path identification for stereophonic acoustic echo cancellation without pre-processing

Does anemia contribute to end-organ dysfunction in ICU patients Statistical Analysis

n 1 n 3 choice node (shelf) choice node (rough group) choice node (representative candidate)

Approximate System Reliability Evaluation

Adaptive Acceptance Threshold Control using Matching Distances with Confidence Values for ROC Curve Optimization

Area Location and Recognition of Video Text Based on Depth Learning Method

Design and Fabrication of Water Heater with Electromagnetic Induction Heating

Jordan Form of a Square Matrix

A Lambda Model Characterizing Computational Behaviours of Terms

1 h, , CaCl 2. pelamis) 58.1%, (Headspace solid -phase microextraction and gas chromatography -mass spectrometry,hs -SPME - Vol. 15 No.

A Method for Singularity Detection in Fingerprint Images

Rapid determination of soluble reactive silicate in seawater by flow injection analysis with spectrophotometric detection and its application

ACTA MATHEMATICAE APPLICATAE SINICA Nov., ( µ ) ( (

Supporting Information. Asymmetric Binary-acid Catalysis with Chiral. Phosphoric Acid and MgF 2 : Catalytic

Divergence for log concave functions

Control Theory & Applications PID (, )

Prey-Taxis Holling-Tanner

MATHACHij = γ00 + u0j + rij

Transcript:

ISSN 000-9825, CODEN RUXUEW E-ail: jo@icaaccn Journal of Software, Vol8, No3, March 2007, pp608 66 http://wwwjoorgcn DOI: 0360/jo80608 Tel/Fax: +86-0-62562563 2007 by Journal of Software All right reerved +, (, 70072) A Robut Boottrapping Algorith of Speaker Model for On-Line Unupervied Speaker Indexing FU Zhong-Hua +, ZHANG Yan-Ning (School of Coputer Science, Northwetern Polytechnical Univerity, Xi an 70072, China) + Correponding author: Phn: +86-29-88494848, E-ail: ailfzh@nwpueducn Fu ZH, Zhang YN A robut boottrapping algorith of peaker odel for on-line unupervied peaker indexing Journal of Software, 2007,8(3):608 66 http://wwwjoorgcn/000-9825/8/608ht Abtract: A robut boottrapping fraework, which eploy Multi-EigenSpace odeling technique baed on regreion cla (RC-MES) to build peaker odel with pare data, and a hort-egent clutering to prevent the too hort egent fro influencing boottrapping, are propoed in thi paper For a real dicuion archived with a total duration of 8 hour, the ignificant robutne of the propoed ethod i deontrated, which not only iprove the peaker change detection perforance but alo outperfor the conventional boottrapping ethod, even if the average boottrapping egent duration i le than 5 econd Key word: : peaker indexing; peaker odel; regreion cla; eigenvoice,, 8 5 : ; ; : TP39 : A, [,2] [3 5] [6],,, ( Supported by the Science & Technology Reearch and Developent Plan of Shanxi Province of China under Grant No2005k04-G23 ( ) Received 2006-07-28; Accepted 2006--3

: 609 [3] ), ( [2] ),,K-L (Kullback-Leibler divergence) [7] [8] (Bayeian inforation criterion, BIC) [9] (generalized likelihood ratio, GLR),,, (univeral background odel, UBM) (Gauian ixture odel, GMM) [0] (aple peaker odel, SSM) [],, [] ( A) λ A ; λ A, A λ A, λ B ; (boottrapping),,,,,, 2, 0 75%,,, [], [2] 3 GMM, [2] BIC, (off-line), (eigenvoice) (axiu likelihood linear regreion, MLLR) (ulti-eigenpace odeling baed on regreion cla RC-MES) [3],,, GLR RC-MES 2,, 2 GLR 22 3 4 RC-MES,,,Kuhn [4], (eigenpace),,,,,,,

60 Journal of Software Vol8, No3, March 2007 (RC-MES) MLLR, MLLR,,,,, (regreion cla tree),, RC-MES : (offline) (online) Offline : () (peaker independent, SI) ( GMM ), [7] ; (2) A S, GMM, S SI ; (3) R S (peaker dependent, SD) SI SD A R*S SD ; (4) A, S [e i (0),e i (),,e i (k)],i=,,s Online : () A, S ; (2) S SI, ; (3), S [w i (),w i (2),,e i (k)],i=,,s ; (4), S GMM; (5) S GMM GMM,GMM GMM (axiu likelihood eigen-decopoition, MLED),GMM SI EM S T T T T [ e ( j), e ( j),, e ( j),, e ( j) ], j 0,,, K e ( j) = 0 M = (), e ( j),k ( [5]) T M t= = r ( ) ( t) T M K [ ] ( ) ( ) e ( i) C ot = r ( t) t= = k= 0 w ( k) e ( k) C (), t S, S,r () (t) o t r ( ) C M k = 0 e ( i) ( ) ( t) = P( i = o, λ ) = p b ( o ) p b ( o ) (3) t t, p b () S,λ S t k k t (2)

: 6 (2) k+ ( S ) k+, Offline GMM EM 2 : Speech ignal Pre-proce Pre-Proceing Model boottrapping bae on RC-MES No Speaker change detection Ye Model exit? No Ye Speaker odel adaptation Fig Short egent clutering Speaker odel et Flow diagra of the propoed unupervied peaker indexing GLR,, RC-MES 2,, K-L BIC GLR,, GLR [6,7] [] X Y,, X={x,x 2,,x N },Y={y,y 2,,y N },,N X Y (axiu likelihood, ML), λ X λ Y Z X Y Z EM GMM, λ Z, GLR d GLR,L() L d GLR =(L(X λ X )+L(Y λ Y )) L(Z λ Z ) (4) N N ( Z λz ) = log p( xi λz ) + log p( yi λz i= i= ) (5)

62 Journal of Software Vol8, No3, March 2007 L N N ( X λx ) = log p( xi λx ), L( Y λy ) = log p( yi λy i= i=,p(x i λ X ) x i λ X,p(y i λ Y ) Z GMM λ X λ Y λ Z, (4) [6], X Y, X Y, 00<d GLR Nlog20 (7) ) d GLR 0 (8), d GLR 0,,, θ GLR,,, (localized earch algorith, LSA) [], d GLR,,, 2, 02 22,,,,,, 20, GMM GMM 80 ~00 ( ) [8],, ( UBM [0] SSM [] ),, RC-MES,,, RC-MES S={, 2,, K } K i j GLR d GLR ( i, j ), d GLR ( i, j )>θ GLR θ RC-MES new, S GLR d GLR ( new, i )( i K) j( j K), d GLR ( new, j )< θ GLR, new j new S (6) new

: 63 S, θ RC-MES, S RC-MES, : () GLR new (2) new (3) ( λ i ) θ id new λ i (); new S (4) S (5) j, θ RC-MES j RC-MES j S (6) () 0 S 3, 8 2 ~5 ~20, 0 75% 6kHz,6 097 26, 2 Mel (Mel-frequency ceptru coefficient, MFCC) MFCC 30 0 RC-MES 6 GMM,,TIMIT [9] 00 (TIMIT,, ) RC-MES GLR θ GLR GLR ( ) ( ), / θ GLR 0, PRC(preciion) RCL(recall),PRC,RCL, : F PRC = (9) RCL = (0) [6] : 20* PRC * RCL F = PRC + RCL 2,,RCL PRC,, ()

64 Journal of Software Vol8, No3, March 2007 θ GLR,RCL PRC,PRC θ GLR 0 0,RCL 069 078PRC 067 055,PRC 074,,θ GLR =0 F 074θ GLR = 0 076, θ GLR = 0 Fig2 Recall 0 09 08 02 02 0 0 0 0 7 0 06 0 0 05 GLR baed After indexing 02 04 02 03 03 04 05 06 07 08 09 Preciion Recall-Preciion tradeoff of peaker change detection with different θ GLR 2 θ GLR Recall-Preciion 2 UBM UBM [4,5,] [0] [7,4], [2] UBM,UBM 52, 6 GMM, MAP UBM 40 RC-MES θ RC-MES 0,,, ( ) 3,,,(),UBM 30 ~5, UBM,, 30 RC-MES, ;, RC-MES, (<5 ), 4,,,,

: 65 RC-MES,, GLR RC_MES, 8,, 0 75%,,,,, NIST 00 80 Accuracy 60 40 Fig3 20 0 UBM adaptation Coon eigenvoice RC-MES baed 5 0 30 Average boottrapping egent duration Speaker indexing accuracy for variou boottrapping approache with different boottrapping egent duration 3 Reference: [] Delacourt P, Welleken CJ DISTBIC: A peaker-baed egentation for audio data indexing Speech Counication, 2000, 32(-2): 26 [2] Lu L, Zhang HJ Unupervied peaker egentation and tracking in real-tie audio content analyi Multiedia Syte, 2005, 0(4):332 343 [3] Sancho SS, Aceni n GA, Jo MLM, Carlo BC Offline peaker egentation uing genetic algorith and utual inforation IEEE Tran on Evolutionary Coputation, 2006,0(2):75 86 [4] Meignier S, Moraru D, Fredouille C, Bonatre JF, Beacier L Step-By-Step and integrated approache in broadcat new peaker diarization Coputer Speech and Language, 2006,20(-2):303 330 [5] Aronowitz H, Burhtein D, Air A Speaker indexing in audio archive uing Gauian ixture coring iulation In: Bengio S, Bourlard H, ed Proc of the t Int l Workhop on Machine Learning for Multiodal Interaction LNCS 336, Heidelberg: Springer-Verlag, 2005 243 252 [6] Anguera X, Wooter C, Pekin B, Aguilo M Robut peaker egentation for eeting: The ICSI-SRI pring diarization yte In: Renal S, Bengio S, ed Proc of the 2nd Int l Workhop on Machine Learning for Multiodal Interation LNCS 3869, Heidelberg: Springer-Verlag, 2005 402 44 [7] Capbell JP Speaker recognition: A tutorial Proc of the IEEE, 997,85(9):437 462 [8] Chen SS, Gopalakrihnan PS Clutering via the Bayeian inforation criterion with application in peech recognition In: Acero A, Hon HW, ed Proc of the 998 IEEE Int l Conf on Acoutic, Speech and Signal Proceing, vol2 Seattle, Wahington: IEEE, 998 645 648

66 Journal of Software Vol8, No3, March 2007 [9] Gih H, Schidt N Text-Independent peaker identification IEEE Signal Proceing Magazine, 994,(4):8 32 [0] Reynold DA, Quatieri TF, Dunn RB Speaker verification uing adapted Gauian ixture odel Digital Signal Proceing, 2000, 0:9 4 [] Kwon S, Narayanan S Unupervied peaker indexing uing generic odel IEEE Tran on Speech and Audio Proceing, 2005, 3(5):004 03 [2] Nihida M, Kawahara T Speaker odel election baed on the Bayeian inforation criterion applied to unupervied peaker indexing IEEE Tran on Speech and Audio Proceing, 2005,3(4):583 592 [3] Fu ZH, Zhao RC Speaker odeling technique baed on regreion cla for peaker identification with pare training In: Li SZ, et al ed Proc of the Sinobioetric 2004 LNCS 3338, Heidelberg: Springer-Verlag, 2004 60 66 [4] Kuhn R, Junqua JC, Niedzielki NP Rapid peaker adaptation in eigenvoice pace IEEE Tran on Speech and Audio Proceing, 2000,8(6):695 706 [5] Fu ZH Reearch on robutne of peaker recognition yte [PhD Thei] Xi an: Northwetern Polytechnique Univerity, 2004 (in Chinee with Englih abtract) [6] Ajera J, McCowan I, Bourland H Robut peaker change detection IEEE Signal Proceing Letter, 2004,(8):649 65 [7] Lu J, Mao B, Sun ZX, Zhang FY An iproved peaker baed peech egentation algorith Journal of Software, 2002,3(2): 274 279 (in Chinee with Englih abtract) http://wwwjoorgcn/000-9825/3/274pdf [8] Reynold DA, Roe RC Robut text-independent peaker identification uing Gauian ixture peaker odel IEEE Tran on Speech and Audio Proceing, 995,3():72 83 [9] Garofolo J, et al DARPA TIMIT acoutic-phonetic continuou peech corpu CD-ROM National Intitute of Standard and Technology, 993 : [5] [ ] :,2004 [7],,2002,3(2):274 279 http://wwwjoorgcn/ 000-9825/3/274pdf (977 ),,CCF, /,, (968 ),,