Discriminative Language Modeling Based on Risk Minimization Training

Σχετικά έγγραφα
A Sequential Experimental Design based on Bayesian Statistics for Online Automatic Tuning. Reiji SUDA,

Quantum annealing inversion and its implementation

Evaluation of Expressing Uncertain Causalities as Conditional Causal Possibilities

IF(Ingerchange Format) [7] IF C-STAR(Consortium for speech translation advanced research ) [8] IF 2 IF

5 Haar, R. Haar,. Antonads 994, Dogaru & Carn Kerkyacharan & Pcard 996. : Haar. Haar, y r x f rt xβ r + ε r x β r + mr k β r k ψ kx + ε r x, r,.. x [,

Vol. 34 ( 2014 ) No. 4. J. of Math. (PRC) : A : (2014) Frank-Wolfe [7],. Frank-Wolfe, ( ).

1 n-gram n-gram n-gram [11], [15] n-best [16] n-gram. n-gram. 1,a) Graham Neubig 1,b) Sakriani Sakti 1,c) 1,d) 1,e)

A Method for Determining Service Level of Road Network Based on Improved Capacity Model

Nondeterministic Finite Automaton Event Detection in Focusing Region. Sequence Analysis. Sequence Analysis. Feature Extraction. Feature Extraction

Proposal of Terminal Self Location Estimation Method to Consider Wireless Sensor Network Environment

MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)

CSJ. Speaker clustering based on non-negative matrix factorization using i-vector-based speaker similarity

Stochastic Finite Element Analysis for Composite Pressure Vessel

[2] REVERB 8 [3], [4] [5] [20] [6], [7], [8], [9], [10] [11] REVERB 8 *1 [9] LDA *2 MLLT (SAT) [8] (basis fmllr) [12] (DNN) [10] DNN [11] [13] [14] Ka

Applying Markov Decision Processes to Role-playing Game

Kernel orthogonal and uncorrelated neighborhood preservation discriminant embedding algorithm

ΠΤΥΧΙΑΚΗ/ ΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ

{takasu, Conditional Random Field

Buried Markov Model Pairwise

Quick algorithm f or computing core attribute

[5] F 16.1% MFCC NMF D-CASE 17 [5] NMF NMF 3. [5] 1 NMF Deep Neural Network(DNN) FUSION 3.1 NMF NMF [12] S W H 1 Fig. 1 Our aoustic event detect

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

8. ΕΠΕΞΕΡΓΑΣΊΑ ΣΗΜΆΤΩΝ. ICA: συναρτήσεις κόστους & εφαρμογές

Power allocation under per-antenna power constraints in multiuser MIMO systems

Bundle Adjustment for 3-D Reconstruction: Implementation and Evaluation

8.1 The Nature of Heteroskedasticity 8.2 Using the Least Squares Estimator 8.3 The Generalized Least Squares Estimator 8.

1530 ( ) 2014,54(12),, E (, 1, X ) [4],,, α, T α, β,, T β, c, P(T β 1 T α,α, β,c) 1 1,,X X F, X E F X E X F X F E X E 1 [1-2] , 2 : X X 1 X 2 ;

Speech Recognition using Phase Information based on Long-Term Analysis

1,a) 1,b) 2 3 Sakriani Sakti 1 Graham Neubig 1 1. A Study on HMM-Based Speech Synthesis Using Rich Context Models

b,% SIR 2 MOTDPC (CDMA 6 ) Aein CDMA Journal of Nonlinear Systems in Elect. Eng., Vol. 1, No 2, Fall 2013

Stabilization of stock price prediction by cross entropy optimization

Noriyasu MASUMOTO, Waseda University, Okubo, Shinjuku, Tokyo , Japan Hiroshi YAMAKAWA, Waseda University

Θέμα : Retrieval Models. Ημερομηνία : 9 Μαρτίου 2006

CAPM. VaR Value at Risk. VaR. RAROC Risk-Adjusted Return on Capital

Fourier transform, STFT 5. Continuous wavelet transform, CWT STFT STFT STFT STFT [1] CWT CWT CWT STFT [2 5] CWT STFT STFT CWT CWT. Griffin [8] CWT CWT

C F E E E F FF E F B F F A EA C AEC

SocialDict. A reading support tool with prediction capability and its extension to readability measurement

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

ΧΩΡΙΚΑ ΟΙΚΟΝΟΜΕΤΡΙΚΑ ΥΠΟΔΕΙΓΜΑΤΑ ΣΤΗΝ ΕΚΤΙΜΗΣΗ ΤΩΝ ΤΙΜΩΝ ΤΩΝ ΑΚΙΝΗΤΩΝ SPATIAL ECONOMETRIC MODELS FOR VALUATION OF THE PROPERTY PRICES

Comparison of Evapotranspiration between Indigenous Vegetation and Invading Vegetation in a Bog

Study on Re-adhesion control by monitoring excessive angular momentum in electric railway traction

Robust Feature Extraction Method Based on Run-Length Compensation for Degraded Character Recognition

Schedulability Analysis Algorithm for Timing Constraint Workflow Models

Estimation, Evaluation and Guarantee of the Reverberant Speech Recognition Performance based on Room Acoustic Parameters

Probabilistic Approach to Robust Optimization

Analysis of prosodic features in native and non-native Japanese using generation process model of fundamental frequency contours

Nonparametric Bayesian T-Process Algorithm for Heterogeneous Gene Regulatory Network

Generalized Fibonacci-Like Polynomial and its. Determinantal Identities

EM Baum-Welch. Step by Step the Baum-Welch Algorithm and its Application 2. HMM Baum-Welch. Baum-Welch. Baum-Welch Baum-Welch.

VBA Microsoft Excel. J. Comput. Chem. Jpn., Vol. 5, No. 1, pp (2006)

2002 Journal of Software, );

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

Research on model of early2warning of enterprise crisis based on entropy

A Non-Negative Sparse Neighbor Representation for Multi-Label Learning Algorithm

Feasible Regions Defined by Stability Constraints Based on the Argument Principle

2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems

Resurvey of Possible Seismic Fissures in the Old-Edo River in Tokyo

α & β spatial orbitals in

X g 1990 g PSRB

Anomaly Detection with Neighborhood Preservation Principle

A Method of Trajectory Tracking Control for Nonminimum Phase Continuous Time Systems

Research on Economics and Management

Optimization, PSO) DE [1, 2, 3, 4] PSO [5, 6, 7, 8, 9, 10, 11] (P)

ER-Tree (Extended R*-Tree)

SNR F0 [2], [3], [4] F0 F0 F0 F0 F0 TUSK F0 TUSK F0 6 TUSK 6 F0 2. F0 F0 [5] [6] [7] p[8] Cepstrum [9], [10] [11] [12] [13] F0 [14] F0 [15] DIO[16] [1

Bayesian random effects model for disease mapping of relative risks

Πανεπιστήµιο Κρήτης - Τµήµα Επιστήµης Υπολογιστών. ΗΥ-570: Στατιστική Επεξεργασία Σήµατος. ιδάσκων : Α. Μουχτάρης. εύτερη Σειρά Ασκήσεων.

ΗΥ537: Έλεγχος Πόρων και Επίδοση σε Ευρυζωνικά Δίκτυα,

Transfer Learning. keywords: transfer learning, inductive transfer, domain adaptation, multitask learning, semi-supervised learning

[1], [2] - (Danfoss, Rexroth, Char-Lynn. [3, 4, 5]), .. [6]. [7]

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

Virtual Noctiluca: Media art of light and music through stream

Supplementary Appendix

Prey-Taxis Holling-Tanner

EL ECTR IC MACH IN ES AND CON TROL. System s vulnerability assessment of a ircraft guarantee system based on improved FPN

The Research on Sampling Estimation of Seasonal Index Based on Stratified Random Sampling

(hidden Markov model: HMM) FUNDAMENTALS OF SPEECH SYNTHESIS BASED ON HMM. Keiichi Tokuda. Department of Computer Science

ΣΥΓΚΡΙΣΗ ΓΕΝΕΤΙΚΩΝ ΑΛΓΟΡΙΘΜΩΝ ΚΑΙ ΜΑΘΗΜΑΤΙΚΟΥ ΠΡΟΓΡΑΜΜΑΤΙΣΜΟΥ ΓΙΑ ΤΗΝ ΥΠΟΒΙΒΑΣΗ ΤΗΣ ΣΤΑΘΜΗΣ ΤΟΥ ΝΕΡΟΥ ΣΕ ΕΚΣΚΑΦΗ

n 1 n 3 choice node (shelf) choice node (rough group) choice node (representative candidate)


Vol.4-DCC-8 No.8 Vol.4-MUS-5 No.8 4// 3 3 Hanning (T ) 3 Hanning 3T (y(t)w(t)) dt =.5 T y (t)dt. () STRAIGHT F 3 TANDEM-STRAIGHT[] 3 F F 3 [] F []. :

ΒΕΛΤΙΣΤΟΠΟΙΗΣΗ ΤΟΠΟΛΟΓΙΑΣ ΚΟΜΒΩΝ ΚΑΙ ΡΑΒ ΩΝ ΣΕ ΙΚΤΥΩΜΑΤΑ

Bayesian Discriminant Feature Selection

«ΑΝΑΠΣΤΞΖ ΓΠ ΚΑΗ ΥΩΡΗΚΖ ΑΝΑΛΤΖ ΜΔΣΔΩΡΟΛΟΓΗΚΩΝ ΓΔΓΟΜΔΝΩΝ ΣΟΝ ΔΛΛΑΓΗΚΟ ΥΩΡΟ»


No. 7 Modular Machine Tool & Automatic Manufacturing Technique. Jul TH166 TG659 A

Motion analysis and simulation of a stratospheric airship


1 (forward modeling) 2 (data-driven modeling) e- Quest EnergyPlus DeST 1.1. {X t } ARMA. S.Sp. Pappas [4]


EQUIVALENT MODEL OF HVDC-VSC AND ITS HYBRID SIMULATION TECHNIQUE

High order interpolation function for surface contact problem

IMES DISCUSSION PAPER SERIES

Μάθηση Λανθανόντων Μοντέλων με Μερικώς Επισημειωμένα Δεδομένα (Learning Aspect Models with Partially Labeled Data) Αναστασία Κριθαρά.

Generalized Linear Model [GLM]

A Method for Creating Shortcut Links by Considering Popularity of Contents in Structured P2P Networks

ΤΕΧΝΟΛΟΓΙΚΟ ΕΚΠΑΙΔΕΥΤΙΚΟ ΙΔΡΥΜΑ ΚΡΗΤΗΣ ΣΧΟΛΗ ΔΙΟΙΚΗΣΗΣ ΚΑΙ ΟΙΚΟΝΟΜΙΑΣ (ΣΔΟ) ΤΜΗΜΑ ΛΟΓΙΣΤΙΚΗΣ ΚΑΙ ΧΡΗΜΑΤΟΟΙΚΟΝΟΜΙΚΗΣ

Evaluation of Methods to Extract Important Scenes for Automatic Digest Generation from a Presentation Video

HDD. Point to Point. Fig. 1: Difference of time response by location of zero.

Ανάκτηση Πληροφορίας

Transcript:

1,a) 1 1 1 2 Bayes Dscrmnatve Language Modelng Based on Rsk Mnmzaton Tranng Kobayash Ako 1,a) Oku Takahro 1 Fujta Yuya 1 Sato Shoe 1 Nakagawa Sech 2 Abstract: Ths paper descrbes dscrmnatve language models (LMs) that reflect nformaton about word errors n automatc speech recognton (ASR). The dscrmnatve LMs are mplemented as a set of penalty scores employng lngustc features and ther weghtng factors. The models are estmated n the bass of mnmzaton of expected rsks that are closely assocated wth word errors. In transcrbng Japanese broadcast programs, the sem-supervsed dscrmnatve LM acheved the best results n word error rates compared wth the supervsed and unsupervsed LMs and conventonal dscrmnatve LMs based on maxmzaton of condtonal log-lkelhoods. Keywords: Bayes rsk, dscrmnatve tranng, mult objectve programmng, sem-supervsed tranng 1. [1] 1 NHK NHK Scence and Technology Research Laboratores, Setagaya, Tokyo 157 8510, Japan 2 Toyohash Unversty of Technology, Toyohash, Ach, 441-8580, Japan a) kobayash.a-fs@nhk.or.jp [2] c 2012 Informaton Processng Socety of Japan 1

/ [3], [4] [3] [4] [5] [6], [7] Bayes [8] (Mult-objectve Optmzaton Programmng, MOP)[9] [6], [7] ( ) λ Λ 2.2 Bayes Bayes N-best [8] ŵ = arg mn R(w, w )P (w x) (2) w w P (w x) x N-best w R(w, w ) w w 2 Levenshten ( ) Bayes (1) Λ 2.3 [4], [5] x (u) m (m =1,...,M) k w m,k () U(Λ) = 1 M P (w m,k x (u) m ; Λ)χ(w m,k ) (3) m k P (w m,k x (u) m ; Λ) w m,k (1) (3) χ(w m,k ) 2. χ(w m,k )= k R(w m,k, w m,k )P (w m,k x (u) m ; Λ) (4) 2.1 x w { } P (w x; Λ) exp f am (x w)+λ lm f lm (w)+ λ f (w) (1) f am (x w) f lm (w) f (w) x (l) n L(Λ) = 1 N P (w n,k x (l) n ; Λ)R(w ref n, w n,k ) (5) n k (4) (5) c 2012 Informaton Processng Socety of Japan 2

(Mnmum Phone Error, MPE) [10] Bayes Levenshten x m L m 2 e e l(e, e ) [5] l(e, e 0 label(e) = label(e ) ) 1 label e ζ(e) e overlap(e) l(e, e )p(e ) (6) overlap e p(e ) p(e) = 1 {α(σ(e)) s(e) β(τ(e))} (7) ᾱ σ(e) e τ(e) α(σ(e)) σ(e) ᾱ β(τ(e)) τ(e) s(e) φ am (e) φ lm (e) { } s(e) = exp λ am φ am (e)+λ lm φ lm (e)+ λ φ (e) (8) λ am λ lm φ (e) f e 1 0 ζ(e) p(e) γ m m γ m/m Λ [4] e δ m,e δ m,e = p(e)(γ(e) γ m )φ (e) (9) γ(e) e m λ e L m δ,e m (u) (u) = 1 δ,e m (10) M m e L m (6) (5) (l) 2.4 [3] L(Λ) = 1 log P (w ref n x (l) n ; Λ) (11) N n [11] m x (u) m U(Λ) = 1 P (w m,k x (u) m ; Λ) log P (w m,k x (u) m ; Λ) M m k (12) (12) 2 2.5 [9] 2 c 2012 Informaton Processng Socety of Japan 3

[6], [7] ( ) Λ ε [12] 2 ε [9] L(Λ) U(Λ) Λ = arg mn L(Λ) subject to U(Λ) Ū (13) Λ Ū Ū=αU(0) (14) α(< 1.0) Λ=0 5% 20 % L(Λ) [13] 2 κ F(Λ) = L(Λ) + ρ 2ρ + U(Λ) Ū (15) κ ρ x max {x, 0}. (13) F ( ) F(Λ) = (l) κ +2ρ λ 2ρ + U(Λ) Ū (u) (16) (15) κ ρ [13] 2 α 2.6 3 f =h1(u 1,u 2,u 3)(w) =c u1,u 2,u 3 (w) (17) w 3 (u 1,u 2,u 3 ) c u1,u 2,u 3 h 1 (u 1,u 2,u 3 ) 3 [4] [14] w q () 3. 3.1 2 12 MFCC 1 2 39 HMM bgram 200-best trgram 650 MPE trgram ( 239M ) 100k NHK 3 ( 1) ( ) 1 2 (PP) (WER) (OOV) trgram c 2012 Informaton Processng Socety of Japan 4

Table 1 1 Evaluaton data for dscrmnatve language modelng PP OOV(%) WER(%) 245 3.5k 125.7 1.5 23.0 551 7.0k 139.4 1.3 22.3 2 Table 2 Tranng data for dscrmnatve language modelng 58.6 26k 697.5k 344.1 218.6k 2.84M 3 Table 3 Perplextes and word error rates for tranng data Table 4 4 Feature functons for dscrmnatve language modelng 2 1.3k 3 12.9k 2 731.9k 3 1859.6k PP OOV(%) WER(%) GER(%) 64.0 2.03 22.3 13.2 163.2 3.07 30.0 16.9 2 [2] (4.5 ) 2.17 k (47.2 k ) 5 3 (GER) L-BFGS [15] 1 α 0.80 0.95 2 3 2 3 5 ( 4) 3.2 ( ) 5 [7] 21.5 % 3.6 % 6 20.9 % 6.3 % () 2.8 % ( 5%) 4. 4.1 5 + + ( 5 %) 10 30 50 % c 2012 Informaton Processng Socety of Japan 5

5 (%) Table 5 Expermental results for dscrmnatve language modelng (WER,%) () 23.0 22.3 22.9 22.1 ( ) 22.8 22.3 22.7 22.2 ( ) 22.3 21.5 + 22.5 22.0 ( ) 21.9 20.9 6 (, %) Table 6 Sem-supervsed dscrmnatve language modelng wth varous amounts of unlabeled tranng data (WER, %) + 10 % 21.3 22.3 30 % 21.2 22.3 50 % 21.1 22.3 100 % 20.9 22.0 2 7 ( %) Table 7 Sem-supervsed dscrmnatve language modelng wth varous amounts of unlabeled tranng data (rsk mnmzaton, %) VTR 10 % 15.8 27.8 30 % 15.9 27.4 50 % 15.8 27.2 100 % 15.7 27.0 16.4 29.2 16.0 29.6 (100 %) 16.2 27.7 6 4.2 (3.8 k ) VTR(, 3.3 k ) 2 ( 7) VTR VTR VTR 29.2 % 27.7 % 5.1 % VTR 4.3 %(16.4 % 15.7 %) VTR c 2012 Informaton Processng Socety of Japan 6

8 (%) Table 8 Comparson of feature functons (%) 22.2 22.6 22.3 21.5 22.4 21.5 21.5 21.8 20.9 7.5 %(29.2 % 27.0 %) 4.3 ( 8) 21.5 % 22.4 % (22.3 %) 5. [16] 1 [1],,,,, :,, Vol. 63, No. 3, pp. 331 338 (2008). [2],,,, :,, Vol. J93-D, No. 10, pp. 2085 2095 (2010). [3] Roark, B., Saraclar, M. and Collns, M.: Dscrmnatve n-gram language modelng, Computer Speech and Language, Vol. 21, pp. 373 392 (2007). [4],,,,, :,, Vol. J93-D, No. 5, pp. 598 609 (2010). [5] Kobayash, A., Oku, T., Homma, S., Ima, T. and Nakagawa, S.: Lattce-based rsk mnmzaton tranng for unsupervsed language model adaptaton, Proc. Interspeech, pp. 1453 1456 (2011). [6] Kobayash, A., Oku, T., Ima, T. and Nakagawa, S.: Mult-objectve optmzaton for sem-supervsed dscrmnatve language modelng, Proc IEEE ICASSP, pp. 4997-5000, (2012). [7] Kobayash, A., Oku, T., Ima, T. and Nakagawa, S.: Rsk-based sem-supervsed dscrmnatve language modelng for broadcast transcrpton, IEICE Trans. Inf. & Syst., Vol.E95-D, No.11 (2012, n press). [8] Goel, V. and Byrne, W.: Mnmum Bayes-rsk automatc speech recognton, Computer Speech and Language, Vol. 14, pp. 115 135 (2000). [9] Marler, R. T. and Arora, J. S.: Survey of multobjectve optmzaton methods for engneerng, Structural and multdscplnary optmzaton, Vol. 26, pp. 369 395 (2004). [10] Povey, D. and Woodland, P. C.: Mnmum phone error and I-smoothng for mproved dscrmnatve tranng, Proc. ICASSP, pp. I 105 108 (2002). [11] Grandvalet, Y. and Bengo, Y.: Sem-supervsed learnng by entropy mnmzaton, Advances n neural nformaton processng systems, pp. 529 536 (2005). [12] Mettnen, K.: Nonlnear multobjectve optmzaton, Sprnger, 1999. [13] Snyman, J.: Practcal mathematcal optmzaton, Sprnger (2005). [14],,,, :, ( ), No. 2-P-35(a) (2011). [15] Lu, D. and Nocedal, J.: On the lmted memory BFGS method for large scale optmzaton, Mathematcal Programmng, Vol. 45, No. 3, pp. 503 528 (1989). [16] Lehr, M. and Shafran, I.: Dscrmnatvely estmated jont acoustc, duraton and language model for speech recognton, Proc. ICASSP, pp.5542 5545, (2010). c 2012 Informaton Processng Socety of Japan 7