(hidden Markov model: HMM) FUNDAMENTALS OF SPEECH SYNTHESIS BASED ON HMM. Keiichi Tokuda. Department of Computer Science

Σχετικά έγγραφα
1,a) 1,b) 2 3 Sakriani Sakti 1 Graham Neubig 1 1. A Study on HMM-Based Speech Synthesis Using Rich Context Models

EM Baum-Welch. Step by Step the Baum-Welch Algorithm and its Application 2. HMM Baum-Welch. Baum-Welch. Baum-Welch Baum-Welch.

Buried Markov Model Pairwise

Sinsy: HMM. Sinsy An HMM-based singing voice synthesis system which can realize your wish I want this person to sing my song

Vol.4-DCC-8 No.8 Vol.4-MUS-5 No.8 4// 3 3 Hanning (T ) 3 Hanning 3T (y(t)w(t)) dt =.5 T y (t)dt. () STRAIGHT F 3 TANDEM-STRAIGHT[] 3 F F 3 [] F []. :

CSJ. Speaker clustering based on non-negative matrix factorization using i-vector-based speaker similarity

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

Fourier transform, STFT 5. Continuous wavelet transform, CWT STFT STFT STFT STFT [1] CWT CWT CWT STFT [2 5] CWT STFT STFT CWT CWT. Griffin [8] CWT CWT

Applying Markov Decision Processes to Role-playing Game

{takasu, Conditional Random Field

MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)

A Bonus-Malus System as a Markov Set-Chain. Małgorzata Niemiec Warsaw School of Economics Institute of Econometrics

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

Nov Journal of Zhengzhou University Engineering Science Vol. 36 No FCM. A doi /j. issn

1 n-gram n-gram n-gram [11], [15] n-best [16] n-gram. n-gram. 1,a) Graham Neubig 1,b) Sakriani Sakti 1,c) 1,d) 1,e)

Fundamentals of Array Antennas

Βελτίωση της ποιότητας συνθετικής φωνής και εφαρμογή σε σύγχρονα τηλεπικοινωνιακά περιβάλλοντα και υπηρεσίες ΔΙΔΑΚΤΟΡΙΚΗ ΔΙΑΤΡΙΒΗ

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

Ψηφιακή Επεξεργασία Φωνής

Web-based supplementary materials for Bayesian Quantile Regression for Ordinal Longitudinal Data

[2] REVERB 8 [3], [4] [5] [20] [6], [7], [8], [9], [10] [11] REVERB 8 *1 [9] LDA *2 MLLT (SAT) [8] (basis fmllr) [12] (DNN) [10] DNN [11] [13] [14] Ka

1181 (real-timespeechdriven) 1 1 ( ) D FAP FAP (voiceactivationdetectionvad) D FaceGen 3- D XfaceEd MPEG-4 1 FAP 66 FAP ( ) FAP 84

Vol. 31,No JOURNAL OF CHINA UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb

Toward a SPARQL Query Execution Mechanism using Dynamic Mapping Adaptation -A Preliminary Report- Takuya Adachi 1 Naoki Fukuta 2.

[5] F 16.1% MFCC NMF D-CASE 17 [5] NMF NMF 3. [5] 1 NMF Deep Neural Network(DNN) FUSION 3.1 NMF NMF [12] S W H 1 Fig. 1 Our aoustic event detect

Schedulability Analysis Algorithm for Timing Constraint Workflow Models

46 2. Coula Coula Coula [7], Coula. Coula C(u, v) = φ [ ] {φ(u) + φ(v)}, u, v [, ]. (2.) φ( ) (generator), : [, ], ; φ() = ;, φ ( ). φ [ ] ( ) φ( ) []

Voice Conversion based on Non-negative Matrix Factorization with Segment Features in Noisy Environments

Research on Economics and Management

Bundle Adjustment for 3-D Reconstruction: Implementation and Evaluation

SNR F0 [2], [3], [4] F0 F0 F0 F0 F0 TUSK F0 TUSK F0 6 TUSK 6 F0 2. F0 F0 [5] [6] [7] p[8] Cepstrum [9], [10] [11] [12] [13] F0 [14] F0 [15] DIO[16] [1

Automatic extraction of bibliography with machine learning

Συνδυασμένη Οπτική-Ακουστική Ανάλυση Ομιλίας

Queensland University of Technology Transport Data Analysis and Modeling Methodologies

Estimation, Evaluation and Guarantee of the Reverberant Speech Recognition Performance based on Room Acoustic Parameters

Research on model of early2warning of enterprise crisis based on entropy

Δυνατότητα Εργαστηρίου Εκπαιδευτικής Ρομποτικής στα Σχολεία (*)

X g 1990 g PSRB

Reading Order Detection for Text Layout Excluded by Image

Speech Recognition using Phase Information based on Long-Term Analysis

ΒΙΟΓΡΑΦΙΚΟ ΣΗΜΕΙΩΜΑ ΗΜΕΡΟΜΗΝΙΑ ΓΕΝΝΗΣΗΣ : 1981 ΟΙΚΟΓΕΝΕΙΑΚΗ ΚΑΤΑΣΤΑΣΗ. : mkrinidi@gmail.com

n 1 n 3 choice node (shelf) choice node (rough group) choice node (representative candidate)

ΠΟΛΥΤΕΧΝΕΙΟ ΚΡΗΤΗΣ ΣΧΟΛΗ ΜΗΧΑΝΙΚΩΝ ΠΕΡΙΒΑΛΛΟΝΤΟΣ

[4] 1.2 [5] Bayesian Approach min-max min-max [6] UCB(Upper Confidence Bound ) UCT [7] [1] ( ) Amazons[8] Lines of Action(LOA)[4] Winands [4] 1

Supplementary Appendix

Optimization, PSO) DE [1, 2, 3, 4] PSO [5, 6, 7, 8, 9, 10, 11] (P)

1530 ( ) 2014,54(12),, E (, 1, X ) [4],,, α, T α, β,, T β, c, P(T β 1 T α,α, β,c) 1 1,,X X F, X E F X E X F X F E X E 1 [1-2] , 2 : X X 1 X 2 ;

Feasible Regions Defined by Stability Constraints Based on the Argument Principle

Detection and Recognition of Traffic Signal Using Machine Learning

Analysis of prosodic features in native and non-native Japanese using generation process model of fundamental frequency contours

Other Test Constructions: Likelihood Ratio & Bayes Tests

Molecular evolutionary dynamics of respiratory syncytial virus group A in

IF(Ingerchange Format) [7] IF C-STAR(Consortium for speech translation advanced research ) [8] IF 2 IF

Table 1: Military Service: Models. Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 num unemployed mili mili num unemployed

Statistical Inference I Locally most powerful tests

Chapter 1 Introduction to Observational Studies Part 2 Cross-Sectional Selection Bias Adjustment

(Statistical Machine Translation: SMT[1]) [2]

476,,. : 4. 7, MML. 4 6,.,. : ; Wishart ; MML Wishart ; CEM 2 ; ;,. 2. EM 2.1 Y = Y 1,, Y d T d, y = y 1,, y d T Y. k : p(y θ) = k α m p(y θ m ), (2.1

Bayesian Discriminant Feature Selection

Estimation for ARMA Processes with Stable Noise. Matt Calder & Richard A. Davis Colorado State University

SocialDict. A reading support tool with prediction capability and its extension to readability measurement

Evaluation of Methods to Extract Important Scenes for Automatic Digest Generation from a Presentation Video

[15], [16], [17] [6] [2] [5] Jiang [6] 2.1 [6], [10] Score(x, y) y ( 1) ( 1 ) b e ( 1 ) b e. O(n 2 ) Jiang [6] (word lattice reranking)

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

Study of In-vehicle Sound Field Creation by Simultaneous Equation Method

Επικοινωνία Ανθρώπου Υπολογιστή. Β2. Αναγνώριση ομιλίας

Simplex Crossover for Real-coded Genetic Algolithms

Markov chains model reduction

Bayesian modeling of inseparable space-time variation in disease risk

Arbitrage Analysis of Futures Market with Frictions

ΕΚΤΙΜΗΣΗ ΤΟΥ ΚΟΣΤΟΥΣ ΤΩΝ ΟΔΙΚΩΝ ΑΤΥΧΗΜΑΤΩΝ ΚΑΙ ΔΙΕΡΕΥΝΗΣΗ ΤΩΝ ΠΑΡΑΓΟΝΤΩΝ ΕΠΙΡΡΟΗΣ ΤΟΥ

MIA MONTE CARLO ΜΕΛΕΤΗ ΤΩΝ ΕΚΤΙΜΗΤΩΝ RIDGE ΚΑΙ ΕΛΑΧΙΣΤΩΝ ΤΕΤΡΑΓΩΝΩΝ

Τεχνολογία Ψυχαγωγικού Λογισμικού και Εικονικοί Κόσμοι Ενότητα 8η - Εικονικοί Κόσμοι και Πολιτιστικό Περιεχόμενο

ΕΛΕΓΧΟΣ ΤΩΝ ΠΑΡΑΜΟΡΦΩΣΕΩΝ ΧΑΛΥΒ ΙΝΩΝ ΦΟΡΕΩΝ ΜΕΓΑΛΟΥ ΑΝΟΙΓΜΑΤΟΣ ΤΥΠΟΥ MBSN ΜΕ ΤΗ ΧΡΗΣΗ ΚΑΛΩ ΙΩΝ: ΠΡΟΤΑΣΗ ΕΦΑΡΜΟΓΗΣ ΣΕ ΑΝΟΙΚΤΟ ΣΤΕΓΑΣΤΡΟ

2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems

HMY 795: Αναγνώριση Προτύπων

ΑΚΑΔΗΜΙΑ ΕΜΠΟΡΙΚΟΥ ΝΑΥΤΙΚΟΥ ΜΑΚΕΔΟΝΙΑΣ ΣΧΟΛΗ ΜΗΧΑΝΙΚΩΝ

Design and Fabrication of Water Heater with Electromagnetic Induction Heating

Διπλωματική Εργασία του φοιτητή του Τμήματος Ηλεκτρολόγων Μηχανικών και Τεχνολογίας Υπολογιστών της Πολυτεχνικής Σχολής του Πανεπιστημίου Πατρών

Anomaly Detection with Neighborhood Preservation Principle

J. of Math. (PRC) 6 n (nt ) + n V = 0, (1.1) n t + div. div(n T ) = n τ (T L(x) T ), (1.2) n)xx (nt ) x + nv x = J 0, (1.4) n. 6 n

Discriminative Language Modeling Based on Risk Minimization Training

ΤΕΙ ΘΕΣΣΑΛΙΑΣ. Αναγνώριση προσώπου με επιλογή των κατάλληλων κυρίων συνιστωσών. ΤΜΗΜΑ ΜΗΧΑΝΙΚΩΝ ΠΛΗΡΟΦΟΡΙΚΗΣ Τ.Ε ΚΑΒΒΑΔΙΑ ΑΛΕΞΑΝΔΡΟΥ.

ΔΙΠΛΩΜΑΤΙΚΕΣ ΕΡΓΑΣΙΕΣ

Wiki. Wiki. Analysis of user activity of closed Wiki used by small groups

ΖΩΝΟΠΟΙΗΣΗ ΤΗΣ ΚΑΤΟΛΙΣΘΗΤΙΚΗΣ ΕΠΙΚΙΝΔΥΝΟΤΗΤΑΣ ΣΤΟ ΟΡΟΣ ΠΗΛΙΟ ΜΕ ΤΗ ΣΥΜΒΟΛΗ ΔΕΔΟΜΕΝΩΝ ΣΥΜΒΟΛΟΜΕΤΡΙΑΣ ΜΟΝΙΜΩΝ ΣΚΕΔΑΣΤΩΝ

[1] DNA ATM [2] c 2013 Information Processing Society of Japan. Gait motion descriptors. Osaka University 2. Drexel University a)

A Study on Segmentation of Artificial Grayscale Image for Vector Conversion



Cite as: Pol Antras, course materials for International Economics I, Spring MIT OpenCourseWare ( Massachusetts

A research on the influence of dummy activity on float in an AOA network and its amendments

Study on Re-adhesion control by monitoring excessive angular momentum in electric railway traction

Modern Bayesian Statistics Part III: high-dimensional modeling Example 3: Sparse and time-varying covariance modeling

Ένα µοντέλο Ισοδύναµης Χωρητικότητας για IEEE Ασύρµατα Δίκτυα. Εµµανουήλ Καφετζάκης

Research of Han Character Internal Codes Recognition Algorithm in the Multi2lingual Environment

MachineDancing: MikuMikuDance (MMD) *1 MMD MMD. Kinect. MachineDancing. 3 MachineDancing. 1 MachineDancing :

SVM. Research on ERPs feature extraction and classification

Stabilization of stock price prediction by cross entropy optimization

Transcript:

HMM 466-8555 (hidden Markov model: HMM) HMM HMM HMM HMM FUNDAMENTALS OF SPEECH SYNTHESIS BASED ON HMM Keiichi Tokuda Department of Computer Science Nagoya Institute of Technology Gokiso-cho, Shouwa-ku, Nagoya, 466-8555 Japan The increasing availability of large speech databases makes it possible to construct speech synthesis systems, which are referred to as corpus-based approach, by applying unit selection and statistical learning algorithms. In constructing such a system, the use of hidden Markov models (HMMs) has arisen largely. This paper aims to describe such approaches in relation to an approach in which synthetic speech is generated from HMMs themselves. speech synthesis, Text-to-Speech Translation, hidden Markov model, HMM, corpus

. (hidden Markov model: HMM) HMM [] [2] [3] tying / ( [4]) / ( [5], [6]) HMM speaker-driven, trainable ) HMM () ( [7]) (2) HMM HMM inventory ( [8], [9]). (3) HMM HMM instance ( [0], []) (4) HMM ( [2] [4]) (), (2), (3) HMM PSOLA [5] (4) vocoded speech HMM HMM HMM (4) HMM 2. HMM 3. HMM 4. HMM 5. 2. (HMM) 2. HMM HMM o t b i (o t ) a ij = P (q t = j q t = i) i, j o t MFCC [6], LPC HMM b i (o) =N (o µ i, U i ) { } = (2π)N U i exp 2 (o µ i) U i (o µ i ) () µ i U i b i (o) HMM N HMM λ π = {π i } N i= A = O: Q: a a22 a33 π a2 a23 b(ot) b2(ot) b3(ot) o o2 o3 o4 o5 ot 2 3 2 2 3 3 (HMM)

2 3 4 i 4 3 2 π a44 a34 b4(o 7) t 2 3 4 5 6 7 T = 8 2 HMM {a ij } N i,j= i B = {b i( )} N i= λ =(A, B, π) Q = {q,q 2,...,q T } O =(o, o 2,..., o T ) T P (O, Q λ) = a qt q t b qt (o t ) (2) t= a q0i = π i O =(o, o 2,..., o T ) λ P (O λ) = P (O, Q λ) all Q P (O λ) O λ P (O λ) HMM k-means (embedded training) HMM HMM 2.3 I HMM λ,λ 2,...,λ I 2 O i max = arg max P (λ i O) i = arg max i P (O λ i )P (λ i ) P (O) P (O λ i ) (5) = all Q T a qt q t b qt (o t ) (3) t= (2) 2 (3) 2.2 HMM HMM λ O (3) P (O λ) λ λ max = arg max P (O λ) (4) λ EM Baum-Welch λ λ {O (), O (2),...,O (m) } HMM P (O λ i ) = P (O, Q λ i ) Q max P (O, Q λ i) (6) Q O λ P (O, Q λ) Q P (O, Q λ) Viterbi 2.4 g e N j i ts u 2

u-g+e g-e+n e-n+j N-j+i j-i+ts i-ts+u ts-u+o 3 [4] 2 3 3 k- a+n t- a+n i- a+t Y N Y N Y N Y N HMM [7] [22] [23] HMM [24] 3. HMM 3. HMM [7] HMM 2.3 Viterbi O Q HMM HMM Viterbi HMM diphone, phone [25] [26] half-phone HMM [] 3 HMM 3.2

3 [27] ([28] ) inventory unseen (a) target cost, context matching score (b) (discontinuity, continuity cost, concatenation cost ) ([29], [28] ) (i) ([30], [28], [26] ) (ii) ([9], [], [3], [32], ) (i) (a) (b) DP (i) (ii) instance (b) DP 3 { } { } { } { } (ii) unseen (instance) DP (ii) HMM HMM ([9], [] ) [33], [34] 4. HMM 4. HMM HMM λ T O max = arg min P (O λ, T ) (7) O λ HMM HMM (6) O max = arg max P (O λ, T ) O arg max O max Q P (O, Q λ, T ) (8) P (O, Q λ, T )=P (O Q,λ,T)P (Q λ, T ) (9) Q P (Q λ) O (8) Q max = arg max P (Q λ, T ) (0) Q

O max = arg max O P (O Q max,λ) () (0) 4.2 () (2), () P (O Q,λ,T) log P (O Q,λ) = log T b qt (o t ) t= = 2 (O M) U (O M) log U 2 + Const (2) O = [o, o 2,...,o T ] (3) M = [µ q, µ q 2,..., µ q T ] (4) U = diag[u q, U q2,..., U qt ] (5) µ qt U qt q t (6) (7) P (O Q,λ) O = M [2] o t c t ( ) c t ( ) 2 c t ( ) o t =[c t, c t, 2 c t] c t 2 c t c t c t = 2 c t = L () + τ = L () L (2) + τ = L (2) w (τ)c t+τ (6) w 2 (τ) c t+τ (7) w (τ) w 2 (τ) (6), (7) O = WC (8) C =[c, c 2,...,c T ] (9) c t M C, O TM 3TM W 3TM TM,w (τ),w 2 (τ) 0 (8) P (O Q,λ) C log P (WC Q,λ) C = 0, (20) W U WC = W U M. (2) W U W TM TM (2) O(T 3 M 3 ) 4 W U W QR O(TM 3 L 2 ) 5 L = max{l (),L () +,L (2),L (2) + }. (2) [36], [37] [38]. (7) (8) [39] 4 [36] state 3 i state 2 state state 3 a state 2 state 0 2 3 4 5 (khz) (a) 0 2 3 4 5 (khz) (b) 4 sil, a, i, sil HMM (a) (b) 4 U q,i O(T 3 M) 5 U q,i O(TML 2 ) L () =, L () + =0,w(2) (i) 0 O(TM) [35]

4.2 HMM t = T i = K i d i p i (d i ) (0) P (Q λ, T ) log P (Q λ, T )= K log p i (d i ) (22) i= K i= d i = T HMM (2) p i (d) p i (d) = a ii a ii a d ii (23) p i (d) (0) Q max {d i } K i= d i = m i + ρ σi 2 (24) ( ) K / K ρ = T m i σi 2 (25) k= k= [40] m i σi 2 i T ρ (25) ρ T (25) ρ ρ =0 4.3 HMM HMM HMM HMM(MSD-HMM)[4] HMM HMM HMM [42] 4.4 4. 4.3 HMM HMM [27] HMM 3 I [43] [44] CTR [45] HMM (ii) HMM [9], [] (ii) HMM HMM HMM (ii) [46] 5. HMM HMM HMM HMM HMM

[] S. Schwartz, Y-L. Chow, O. Kimball, S. Roucos, M. Krasner, and J. Makhoul, Context-dependent modeling for acousticphonetic of continuous speech, Proc. ICASSP, pp.205 208, 985. [2] S. Furui, Speaker independent isolated word recognition using dynamic features of speech spectrum, IEEE Trans. Acoust., Speech, Signal Processing, vol.34, pp.52 59, 986. [3] B.-H. Juang, Maximum-likelihood estimation for mixture multivariate stochastic observations of Markov chains, AT&T Technical Journal, vol.64, no.6, pp.235 249, 985. [4] J. J. Odell, The use of context in large vocabulary speech recognition, PhD dissertation, Cambridge University, 995. [5] C. H. Lee, C. H. Lin, and B. H. Juang, A Study on speaker adaptation of the parameters of continuous density hidden Markov models, IEEE Trans. Acoust., Speech, Signal Processing, vol.39, no.4, pp.806-84, Apr. 992. [6] M. J.F. Gales, and P. C. Woodland, Mean and variance adaptation within the MLLR framework, Computer Speech and Language, vol.0, No.4, pp.249 264, Apr. 996. [7] A. Ljolje, J. Hirschberg and J. P. H. van Santen, Automatic speech segmentation for concatenative inventory selection, in Progress in Speech Synthesis, ed. J. P. H. van Santen, R. W. Sproat, J. P. Olive and J. Hirschberg, Springer-Verlag, New York, 997. [8] R. E. Donovan and P. C. Woodland, Automatic speech synthesiser parameter estimation using HMMs, Proc. ICASSP, pp.640 643, 995. [9] X. Huang, A. Acero, H. Hon, Y. Ju, J. Liu, S. Meredith and M. Plumpe, Recent improvements on Microsoft s trainable text-to-speech system -Whistler, Proc. ICASSP, 997. [0] H. Hon, A. Acero, X. Huang, J. Liu and M. Plumpe, Automatic generation of synthesis units for trainable text-tospeech synthesis, Proc. ICASSP, 998. [] R. E. Donovan and E. M. Eide, The IBM Trainable Speech Synthesis System, Proc. ICSLP, vol.5, pp.703 706, 998. [2] A. Falaschi, M. Giustiniani and M. Verola, A hidden Markov model approach to speech synthesis, Proc. EU- ROSPEECH, pp.87 90, 989. [3] M. Giustiniani and P. Pierucci, Phonetic ergodic HMM for speech synthesis, Proc. EUROSPEECH, pp.349 352, 99. [4],,,, HMM, (D), vol.j79-d-ii, no.2, pp.284 290, Dec. 996. [5] E. Moulines, and F. Charpentier, Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones, Speech Communication, no.9, pp.453 467, 985. [6] S. B. Davis and P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE trans. Acoust., Speech, Signal Processing, vol.assp-33, pp.357 366, Aug. 986. [7],,, 988. [8],,, 995. [9] L. Rabinar and B.-J. Juang ( ), ( ) ( ), NTT, 995. [20],,,,, 996. [2],,,,, 997. [22],,, 998. [23] X. D. Huang, Y. Ariki and M. A. Jack, Hidden Markov Models for Speech Recognition, Edinburgh University Press, Edinburgh, 990. [24] http://htk.eng.cam.ac.uk/. [25] Y. Sagisaka, N. Kaiki, N. Iwahashi and K. Mimura, ATR ν-talk speech synthesis system, Proc. ICSLP, pp.483 486, 992. [26] B. Beutnagel, A. Conkie, J. Schroeter, Y. Stylianou and A. Syrdal, The AT&T Next-Gen TTS system, Proc. Joint ASA, EAA and DAEA Meeting, pp.5 9, Mar. 999. [27],,,,, HMM, (D-II), vol.j83-d-ii, no., Nov. 2000. [28] A. W. Black and N. Campbell, Optimising selection of units from speech databases for concatenative synthesis, Proc. EUROSPEECH, pp.58-584, Sep 995. [29],,,,, SP9-5, 99. [30] A. G. Hauptmann, SPEAKEZ: A first experiment in concatenation synthesis from a large corpus, Proc. EU- ROSPEECH, pp.70 704, 993. [3] A. W. Black P. Taylor, Automatically clustering similar units for unit selection in speech synthesis, Proc. EU- ROSPEECH, pp.60 604, Sep 997. [32] M. W. Macon, A. E. Cronk and J. Wouters, Generalization and discrimination in tree-structured unit selection, Proc. ESCA/COCOSDA Workshop on Speech Synthesis, Nov. 998. [33],, Journal of Signal Processing, vol.2, no.6, Nov. 998. [34], 2, IPSJ Magazine, vol.4, no.3, Mar. 2000. [35] A. Acero, Formant analysis and synthesis using hidden Markov models, Proc. EUROSPEECH, Budapest, Hungary, pp.047 050, 999. [36] K. Tokuda, T. Kobayashi and S. Imai, Speech parameter generation from HMM using dynamic features, Proc. ICASSP-95, pp.660 663, 995. [37] K. Tokuda, T. Masuko, T. Yamada, T. Kobayashi and Satoshi Imai : An Algorithm for Speech Parameter Generation from Continuous Mixture HMMs with Dynamic Features, Proc. EUROSPEECH-95, pp.757 760, 995. [38] K. Koishida, K. Tokuda, T. Masuko and T. Kobayashi, Vector quantization of speech spectral parameters using statistics of dynamic features, Proc. ICSP, vol., pp.247 252, Aug. 997. [39] K. Tokuda, Takayoshi Yoshimura, T. Masuko, T. Kobayashi, T. Kitamura, Speech parameter generation algorithms for HMM-based speech synthesis, Proc. ICASSP, Turkey, June 2000. [40],,,, HMM,, vol.53, no.3, pp.92 200, Mar. 997. [4],,,, HMM,, SP98-, pp.9 26, Apr. 998. [42],,,, HMM,, SP98-2, pp.27 34, Apr. 998. [43],, 2,, vol.49, no.0, pp.682 690, Oct. 993. [44] M. Riley, Tree-Based Modelling of Segmental Duration, Talking Machines: Theories, Models, and Designs, Elsevier Science Publishers, pp.265 273, 992. [45] N. Iwahashi and Y. Sagisaka, Statistical Modelling of Speech Segment Duration by Constrained Tree Regression, IEICE trans, vol.e83-d, no.7, pp.550 559, July 2000. [46],,, SP99-6, pp.48 54, Aug. 999.