Sinsy: HMM. Sinsy An HMM-based singing voice synthesis system which can realize your wish I want this person to sing my song

Σχετικά έγγραφα
1,a) 1,b) 2 3 Sakriani Sakti 1 Graham Neubig 1 1. A Study on HMM-Based Speech Synthesis Using Rich Context Models

Vol.4-DCC-8 No.8 Vol.4-MUS-5 No.8 4// 3 3 Hanning (T ) 3 Hanning 3T (y(t)w(t)) dt =.5 T y (t)dt. () STRAIGHT F 3 TANDEM-STRAIGHT[] 3 F F 3 [] F []. :

Fourier transform, STFT 5. Continuous wavelet transform, CWT STFT STFT STFT STFT [1] CWT CWT CWT STFT [2 5] CWT STFT STFT CWT CWT. Griffin [8] CWT CWT

(hidden Markov model: HMM) FUNDAMENTALS OF SPEECH SYNTHESIS BASED ON HMM. Keiichi Tokuda. Department of Computer Science

MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)

Estimation, Evaluation and Guarantee of the Reverberant Speech Recognition Performance based on Room Acoustic Parameters

Buried Markov Model Pairwise

EM Baum-Welch. Step by Step the Baum-Welch Algorithm and its Application 2. HMM Baum-Welch. Baum-Welch. Baum-Welch Baum-Welch.

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

CSJ. Speaker clustering based on non-negative matrix factorization using i-vector-based speaker similarity

SNR F0 [2], [3], [4] F0 F0 F0 F0 F0 TUSK F0 TUSK F0 6 TUSK 6 F0 2. F0 F0 [5] [6] [7] p[8] Cepstrum [9], [10] [11] [12] [13] F0 [14] F0 [15] DIO[16] [1

VOCODER VOCODER Vocal

Retrieval of Seismic Data Recorded on Open-reel-type Magnetic Tapes (MT) by Using Existing Devices

Study of In-vehicle Sound Field Creation by Simultaneous Equation Method

v.connect 2 v.connect : A Singing Synthesis System Enabling Users to Control Vocal Tones Makoto Ogawa, 1 Syunji Yazaki 1 and Kôki Abe 1 VOCALOID

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

A study of geometric dependency of cepstrum on vocal tract length

Query by Phrase (QBP) (Music Information Retrieval, MIR) QBH QBP / [1, 2] [3, 4] Query-by-Humming (QBH) QBP MIDI [5, 6] [8 10] [7]

Speech Recognition using Phase Information based on Long-Term Analysis

Applying Markov Decision Processes to Role-playing Game

Ψηφιακή Επεξεργασία Φωνής

Signal processing for handling singing voice texture

Research on Economics and Management

Ψηφιακή Επεξεργασία Φωνής

Singing Information Processing: Music Information Processing for Singing Voices

Approximation of distance between locations on earth given by latitude and longitude

Chapter 1 Introduction to Observational Studies Part 2 Cross-Sectional Selection Bias Adjustment

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

1 n-gram n-gram n-gram [11], [15] n-best [16] n-gram. n-gram. 1,a) Graham Neubig 1,b) Sakriani Sakti 1,c) 1,d) 1,e)

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

[5] F 16.1% MFCC NMF D-CASE 17 [5] NMF NMF 3. [5] 1 NMF Deep Neural Network(DNN) FUSION 3.1 NMF NMF [12] S W H 1 Fig. 1 Our aoustic event detect

Acoustic Signal Adjustment by Considering Musical Expressive Intention Using a Performance Intension Function

n 1 n 3 choice node (shelf) choice node (rough group) choice node (representative candidate)

Sampling Basics (1B) Young Won Lim 9/21/13

Introduction to Bioinformatics

Maxima SCORM. Algebraic Manipulations and Visualizing Graphs in SCORM contents by Maxima and Mashup Approach. Jia Yunpeng, 1 Takayuki Nagai, 2, 1

Development of a Seismic Data Analysis System for a Short-term Training for Researchers from Developing Countries

Analysis of prosodic features in native and non-native Japanese using generation process model of fundamental frequency contours

Supplementary Appendix

Ψηφιακή ανάπτυξη. Course Unit #1 : Κατανοώντας τις βασικές σύγχρονες ψηφιακές αρχές Thematic Unit #1 : Τεχνολογίες Web και CMS

Re-Pair n. Re-Pair. Re-Pair. Re-Pair. Re-Pair. (Re-Merge) Re-Merge. Sekine [4, 5, 8] (highly repetitive text) [2] Re-Pair. Blocked-Repair-VF [7]

Automatic extraction of bibliography with machine learning

HIV HIV HIV HIV AIDS 3 :.1 /-,**1 +332

Πανεπιστήμιο Κρήτης, Τμήμα Επιστήμης Υπολογιστών Άνοιξη HΥ463 - Συστήματα Ανάκτησης Πληροφοριών Information Retrieval (IR) Systems

Detection and Recognition of Traffic Signal Using Machine Learning

ER-Tree (Extended R*-Tree)

ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ

Other Test Constructions: Likelihood Ratio & Bayes Tests

Κάθε γνήσιο αντίγραφο φέρει υπογραφή του συγγραφέα. / Each genuine copy is signed by the author.

: TANDEM-STRAIGHT. Make singing voice tangible: TANDEM-STRAIGHT and temporally variable morphing as substrate. Hideki Kawahara 1 and Masanori Morise 2

Συνδυασμένη Οπτική-Ακουστική Ανάλυση Ομιλίας

Development of a basic motion analysis system using a sensor KINECT

Εκτεταμένη περίληψη Περίληψη

GPGPU. Grover. On Large Scale Simulation of Grover s Algorithm by Using GPGPU

Thesis presentation. Turo Brunou

BandPass (4A) Young Won Lim 1/11/14

NMBTC.COM /

Διπλωματική Εργασία. του φοιτητή του Τμήματος Ηλεκτρολόγων Μηχανικών και Τεχνολογίας Υπολογιστών της Πολυτεχνικής Σχολής του Πανεπιστημίου Πατρών

ST5224: Advanced Statistical Theory II

Evolution of Novel Studies on Thermofluid Dynamics with Combustion

ΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ. ΘΕΜΑ: «ιερεύνηση της σχέσης µεταξύ φωνηµικής επίγνωσης και ορθογραφικής δεξιότητας σε παιδιά προσχολικής ηλικίας»

ΠΕΡΙΕΧΟΜΕΝΑ. Κεφάλαιο 1: Κεφάλαιο 2: Κεφάλαιο 3:

C.S. 430 Assignment 6, Sample Solutions

Technical Research Report, Earthquake Research Institute, the University of Tokyo, No. +-, pp. 0 +3,,**1. No ,**1

Main source: "Discrete-time systems and computer control" by Α. ΣΚΟΔΡΑΣ ΨΗΦΙΑΚΟΣ ΕΛΕΓΧΟΣ ΔΙΑΛΕΞΗ 4 ΔΙΑΦΑΝΕΙΑ 1

Schedulability Analysis Algorithm for Timing Constraint Workflow Models

IMES DISCUSSION PAPER SERIES

Nov Journal of Zhengzhou University Engineering Science Vol. 36 No FCM. A doi /j. issn

ΓΕΩΜΕΣΡΙΚΗ ΣΕΚΜΗΡΙΩΗ ΣΟΤ ΙΕΡΟΤ ΝΑΟΤ ΣΟΤ ΣΙΜΙΟΤ ΣΑΤΡΟΤ ΣΟ ΠΕΛΕΝΔΡΙ ΣΗ ΚΤΠΡΟΤ ΜΕ ΕΦΑΡΜΟΓΗ ΑΤΣΟΜΑΣΟΠΟΙΗΜΕΝΟΤ ΤΣΗΜΑΣΟ ΨΗΦΙΑΚΗ ΦΩΣΟΓΡΑΜΜΕΣΡΙΑ

1181 (real-timespeechdriven) 1 1 ( ) D FAP FAP (voiceactivationdetectionvad) D FaceGen 3- D XfaceEd MPEG-4 1 FAP 66 FAP ( ) FAP 84

Yoshifumi Moriyama 1,a) Ichiro Iimura 2,b) Tomotsugu Ohno 1,c) Shigeru Nakayama 3,d)

Stabilization of stock price prediction by cross entropy optimization

High order interpolation function for surface contact problem

Διπλωματική Εργασία του φοιτητή του Τμήματος Ηλεκτρολόγων Μηχανικών και Τεχνολογίας Υπολογιστών της Πολυτεχνικής Σχολής του Πανεπιστημίου Πατρών

Study on Re-adhesion control by monitoring excessive angular momentum in electric railway traction

GPU. CUDA GPU GeForce GTX 580 GPU 2.67GHz Intel Core 2 Duo CPU E7300 CUDA. Parallelizing the Number Partitioning Problem for GPUs

[2] REVERB 8 [3], [4] [5] [20] [6], [7], [8], [9], [10] [11] REVERB 8 *1 [9] LDA *2 MLLT (SAT) [8] (basis fmllr) [12] (DNN) [10] DNN [11] [13] [14] Ka

Math 6 SL Probability Distributions Practice Test Mark Scheme

Molecular evolutionary dynamics of respiratory syncytial virus group A in

SCOPE OF ACCREDITATION TO ISO 17025:2005

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

5.4 The Poisson Distribution.

Resurvey of Possible Seismic Fissures in the Old-Edo River in Tokyo

Vol. 31,No JOURNAL OF CHINA UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb

Global energy use: Decoupling or convergence?


GUI

Πανεπιστήµιο Πειραιώς Τµήµα Πληροφορικής

SocialDict. A reading support tool with prediction capability and its extension to readability measurement

Numerical Analysis FMN011

ΠΟΛΥΤΕΧΝΕΙΟ ΚΡΗΤΗΣ ΤΜΗΜΑ ΜΗΧΑΝΙΚΩΝ ΠΕΡΙΒΑΛΛΟΝΤΟΣ

Study of urban housing development projects: The general planning of Alexandria City

ΙΕΥΘΥΝΤΗΣ: Καθηγητής Γ. ΧΡΥΣΟΛΟΥΡΗΣ Ι ΑΚΤΟΡΙΚΗ ΙΑΤΡΙΒΗ

[4] 1.2 [5] Bayesian Approach min-max min-max [6] UCB(Upper Confidence Bound ) UCT [7] [1] ( ) Amazons[8] Lines of Action(LOA)[4] Winands [4] 1

Current Status of PF SAXS beamlines. 07/23/2014 Nobutaka Shimizu

ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ. του Φοιτητή του Τμήματος Ηλεκτρολόγων Μηχανικών και Τεχνολογίας Υπολογιστών της Πολυτεχνικής Σχολής του Πανεπιστημίου Πατρών

ΑΠΟΔΟΤΙΚΗ ΑΠΟΤΙΜΗΣΗ ΕΡΩΤΗΣΕΩΝ OLAP Η ΜΕΤΑΠΤΥΧΙΑΚΗ ΕΡΓΑΣΙΑ ΕΞΕΙΔΙΚΕΥΣΗΣ. Υποβάλλεται στην

«ΑΓΡΟΤΟΥΡΙΣΜΟΣ ΚΑΙ ΤΟΠΙΚΗ ΑΝΑΠΤΥΞΗ: Ο ΡΟΛΟΣ ΤΩΝ ΝΕΩΝ ΤΕΧΝΟΛΟΓΙΩΝ ΣΤΗΝ ΠΡΟΩΘΗΣΗ ΤΩΝ ΓΥΝΑΙΚΕΙΩΝ ΣΥΝΕΤΑΙΡΙΣΜΩΝ»

ΓΕΩΠΟΝΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙO ΑΘΗΝΩΝ ΤΜΗΜΑ ΑΞΙΟΠΟΙΗΣΗΣ ΦΥΣΙΚΩΝ ΠΟΡΩΝ & ΓΕΩΡΓΙΚΗΣ ΜΗΧΑΝΙΚΗΣ

Transcript:

. Sinsy: HMM 2 (hidden Markov model; HMM) 2009 2 HMM : Sinsy Sinsy 70 Sinsy Sinsy An HMM-based singing voice synthesis system which can realize your wish I want this person to sing my song Keiichiro Oura, Ayami Mase, Tomohiko Yamada, Keiichi Tokuda and Masataka Goto 2 A statistical parametric approach to singing voice synthesis based on hidden Markov models (HMMs) has been grown over the last few years. In this approach, spectrum, excitation, and duration of singing voices are simultaneously modeled by context-dependent HMMs, and waveforms are generated from HMMs themselves. Since December 2009, we started a free on-line service named Sinsy. By uploading musical scores represented by MusicXML to the Sinsy website, users can obtain synthesized singing voices. However, a high recording cost may be required to train new singer s model because a speakerdependent model trained by using 70 songs is used in Sinsy. The present paper describes the recent developments of Sinsy and a speaker adaptation technique to generate waveforms from a small amount of adaptation data. (hidden Markov model; HMM) ) HMM HMM 2) 3) 4) HMM 5) HMM 2009 2 HMM : Sinsy 6) Sinsy HTS 7) hts engine API 8) SPTK 9) STRAIGHT 0) CrestMuseXML Toolkit ) MusicXML 2) Sinsy 70 Sinsy (Constrained Maximum-Likelihood Linear Regression; CMLLR) 3) 2 HMM Sinsy 3 Sinsy 4 5 6 2. Sinsy HMM Sinsy 2. HMM Vocaloid 4) Nagoya Institute of Technology (NIT) 2 National Institute of Advanced Industrial Science and Technology (AIST) c 200 Information Processing Society of Japan

VocaListener 5) Sinsy Sinsy HMM Vocaloid HMM ( ) ( 2 ) ( 3 ) ( 4 ) 4 2) 3) HMM 5) 6) HMM 3.2 HMM 7) MLSA 8) Fig. SINGING VOICE DATABASE Label Context-dependent HMMs and duration models MUSICAL SCORE Conversion Speech signal Label Excitation parameter extraction F0 F0 Training of HMM Parameter generation from HMM Excitation generation Spectral parameter extraction MLSA filter Mel-cepstrum Mel-cepstrum Training part Synthesis part SYNTHESIZED SINGING VOICE HMM The overview of the HMM-based singing voice synthesis system. 2.2 Sinsy Speech Signal Processing Toolkit (SPTK) SPTK-3.3 9) UNIX BSD 9) Sinsy 6) SPTK STRAIGHT STRAIGHT V40 0), 20) Sinsy STRAIGHT STRAIGHT HMM-based Speech Synthesis System (HTS) HTS-2.. 7), 2) HMM Sinsy HMM HTK 22) BSD 9) HTK 2 3. 2 HTK 2 c 200 Information Processing Society of Japan

CrestMuseXML Toolkit (CMX) MusicXML-2.0 2) ), 23) CMX-0.50 2 MusicXML MusicXML HMM-based Speech Synthesis Engine (hts engine API) HTK HTS Sinsy SourceForge HMM hts engine API-.03 8) HTK HTS BSD 9) 2.3 3 MusicXML HMM MusicXML 5 MusicXML 200 4 Sinsy MusicXML MusicXML 70% 30% MusicXML 3. Sinsy HMM 3. 24) HMM ) Sinsy 25) <?xml version=.0 encoding= UTF-8 > <!DOCTYPE score-partwise PUBLIC -//Recordare//DTD MusicXML 2.0 Partwise//EN http://www.musicxml.org/dtds/partwise.dtd >... <measure number= > <attributes> <divisions></divisions> <key> <fifths>0</fifths> <fifths>major</fifths> </key> <time> <beats>4</beats> <beat-type>4</beat-type> </time>... </attributes> <sound tempo= 20 /> <note> <pitch> <step>e</step> <octave>5</octave> </pitch> <duration>2</duration> <type>half</type> <lyric> <text> </text> </lyric> </note> <note> <pitch> <step>c</step> <octave>5</octave> </pitch> <duration>2</duration> <type>half</type> <lyric> <text> </text> </lyric> </note> </measure> </part> </score-partwise> 2 2 MusicXML Fig. 2 An example of MusicXML. 3 c 200 Information Processing Society of Japan

Waveform Log F 0 : Vibrato 4 Fig. 4 An example of vibrato parts in a log F 0 sequence. 3 HMM Sinsy Fig. 3 HMM-based Speech Synthesis System Sinsy. 3.2 26), 27) 4 HMM Sinsy 28) t = 0,,..., T + i = 0,,..., M ν ( ) ν ( m a (t), m f (t), i ) = m a (t) sin ( ( )) 2π m f (t) f s t t (i) 0, () f s t (i) 0 i m a (t) m f (t) t 27) 5 c = log 2/200 cent o t o (spec) t o (F 0) t o (vib) t s b s (o t ) ( ) ( ) b s (o t ) = p γspec s o (spec) γ F0 t p s o (F 0) ( ) t p γ vib s o (vib) t (2) γ spec, γ F0, γ vib 3 4 c 200 Information Processing Society of Japan

m f(t ) m f(t 3) 800 700 600 Before pitch-shift After pitch-shift Log F 0 0 2c. m a(t 0) m f(t 0) 2c. m a(t ) 2c. m a(t 2) m f(t 2) 2c. m a(t 3) 2c. m a(t 4) m f(t 4) t 0 t t 2 t 3 t 4 t 5 t 6 Frame index 5 Fig. 5 The vibrato parameter analysis. 3.3 3. 24) Sinsy 30) 6 0 3 (Minimum Description Length; MDL) 3) 3 Sinsy MDL HMM The number of notes 500 400 300 200 00 0 G3 Ab3 A3 Bb3 B3 C4 Db4 D4 Eb4 E4 F4 Gb4 G4 Ab4 A4 Bb4 B4 Pitch 6 0 Fig. 6 The distribution of pitch in training data (0 songs). S q S q+ S q q q = L (S ) { L ( S q+ ) + L ( S q )} + α N 2 log Γ (S 0) (3) L ( ) S 0 α N Γ ( ) 3.4 HMM HMM HMM EM t = 0,..., T + i =,..., M α i (t) β i (t) M (T + ) Sinsy 32) 7 t i k Note start (k) Note end (k) Margin start (k) Margin end (k) t t = Note start (k) Margin start (k) (4) C5 Db5 D5 Eb5 E5 F5 5 c 200 Information Processing Society of Japan

sa i ta sa i ta Note start(k) Note end(k) Note timing sil y a n e y o Voice timing sil y a n e y o r i r i khz Model index Margin start(k) Margin end(k) Spectrogram of natural singing voice Waveform of natural singing voice 6 5 4 3 2 Frame number Search space before pruning Search space after pruning 8 Fig. 8 An example of timings. Fig. 7 7 The pruning approach using note boundaries. 4. t = Note end (k) + Margin end (k) (5) 3.5 HMM HMM 5) 8 Sinsy WFST 33) HSMM 34) k T k T k = T k g k + g k, (6) T k g k Sinsy 70 CMLLR 3) CMLLR HMM i µ i, U i ˆµ i Û i H i b i ˆµ i = H i µ i + b i (7) Û i = H i U i H i (8) H i b i EM CMLLR H i, b i H i, b i 35) 5. 5. f00 70 70 48kHz 6bit STRAIGHT 20) 6) 50, 2, 2 6 c 200 Information Processing Society of Japan

(cent) (Hz), 2 30-50cent 5-8Hz 36), 37) HMM 5 left-to-right HSMM 34) 5ms MDL 3) α (3) 5.0 HMM 2.6MBytes 5.2 f00 Vocaloid2 4) RWC (RWC-MDB-P-200) 38) (Rin) 2 7 8 8 3 2 3 6 8 0 8 9 (f00) (KBytes) Table The total file sizes of Sinsy (KBytes). Front-end program (CMX) 456 Phoneme table 3 Back-end program (hts engine API) 677 Acoustic model 652 Total file size of Sinsy 2588 2 Table 2 Adaptation data. Singer Model Total song length for adaptation f00 (Sinsy) Speaker-dependent model - Miku Hatsune (CV0) Speaker adapted model 9m28s (5 songs) Rin (RWC-MDB-P-200) Speaker adapted model 7m59s (7 songs) 6. Singer recognition rate (%) 00 80 60 40 20 0 f00 (Sinsy) Miku Hatsune (CV0) 9 Fig. 9 Subjective evaluation results. Rin (RWC-MDB-P-200) HMM Sinsy HMM HMM Sinsy Sinsy MusicXML MusicXML Sinsy CGM SCOPE 5. ) T.Yoshimura, K.Tokuda, T.Masuko, T.Kobayashi, and T.Kitamura, Simultaneous Modeling of Spectrum, Pitch and Duration in HMM-based Speech Synthesis, Proc.of Eurospeech, pp.2347 2350, 999. 2) J. Yamagishi, Average-Voice-based Speech Synthesis, Ph. D. thesis, Tokyo Institute of Technology, 2006. 3) T.Yoshimura, K.Tokuda, T.Masuko, T.Kobayashi, and T.Kitamura, Speaker Interpolation in HMM-based Speech Synthesis System, Proc.of Eurospeech, pp.2523 2526, 997. 7 c 200 Information Processing Society of Japan

4) K.Shichiri, A.Sawabe, K.Tokuda, T.Masuko, T.Kobayashi, and T.Kitamura, Eignvoices for HMM-based Speech Synthesis, Proc.of ICSLP, pp.269 272, 2002. 5) K.Saino, H.Zen, Y.Nankaku, A.Lee, and K.Tokuda, An HMM-based Singing Voice Synthesis System, Proc.of ICSLP, pp.4 44, 2006. 6) HMM : Sinsy, http://www.sinsy.jp/. 7) HMM-based Speech Synthesis System (HTS), http://hts.sp.nitech.ac.jp/. 8) HMM-based Speech Synthesis Engine (hts engine API), http://hts-engine.sourceforge.net/. 9) Speech Signal Processing Toolkit (SPTK), http://sp-tk.sourceforge.net/. 0) A Speech Analysis, Modification and Synthesis System (STRAIGHT), http://www.wakayamau.ac.jp/kawahara/straightadv/index e.html. ) CrestMuseXML Toolkit (CMX), http://cmx.sourceforge.jp/. 2) MusicXML Definition, http://musicxml.org/. 3) M.J.F. ales, Maximum Likelihood Linear Transformations for HMM-based Speech Recognition, Proc.of Computer Speech & Language, vol.2, no.2, pp.75 98, 998. 4) H.Kenmochi and H.Ohshita, VOCALOID Commercial Singing Synthesizer based on Sample Concatenation, Proc.of Interspeech, 2007. 5),, VocaListener:,, vol.2008-mus-75, no.50, pp.49 56, 2008. 6) K. Tokuda, T. Kobayashi, T. Chiba, and S. Imai, Spectral Estimation of Speech by Mel- Generalized Cepstral Analysis, Proc.of IEICE Trans., vol.75-a, no.7, pp.24 34, 992. 7) K.Tokuda, T.Yoshimura, T.Masuko, T.Kobayashi, and T.Kitamura, Speech Parameter Generation Algorithms for HMM-based Speech Synthesis, Proc. of ICASSP, pp. 35 38, 2000. 8) S.Imai, Cepstral Analysis Synthesis on the Mel Frequency Scale, Proc.of ICASSP, pp.93 96, 983. 9) A New and Simplified BSD License, http://www.opensource.org/licenses/bsd-license.php. 20) H.Kawahara, M.K.Ikuyo, and A.Cheneigne, Restructuring Speech Representations using a Pitch-Adaptive Time-Frequency Smoothing and an Instantaneous-Frequency-based F 0 Extraction: Possible Role of a Repetitive Structure in Sounds, Proc.of Speech Communication, vol.27, pp.87 207, 999. 2) H. Zen, K. Oura, T. Nose, J. Yamagishi, S. Sako, T. Toda, T. Masuko, A. W. Black, and K. Tokuda, Recent Development of the HMM-based Speech Synthesis System (HTS), Proc.of APSIPA, pp.2-30, 2009. 22) The Hidden Markov Model Toolkit (HTK), http://htk.eng.cam.ac.uk/. 23),, CrestMuseXML (CMX) Toolkit ver.0.40,, vol.2008-mus-75, no.7, pp.95 00, 2009. 24) J.Odell, The Use of Context in Large Vocabulary Speech Recognition, Ph.D.thesis, Cambridge University, 995. 25) K. Oura, A. Mase, T. Yamada, S. Muto, Y. Nankaku, and K. Tokuda, Recent Development of the HMM-based Singing Voice Synthesis System Sinsy, Proc.of SSW7, 200 (to be published). 26),,,,,, vol.64, no.5, pp.267 277, 2008. 27) T.Nakano, M.Goto, and Y.Hiraga, An Automatic Singing Skill Evaluation Method for Unknown Melodies Using Pitch Interval Accuracy and Vibrato Features, Proc.of Interspeech, pp.706 709, 2006. 28),,,,, HMM,, vol.2009-mus-80, no.5, pp.309 32, 2009. 29) A. Kuramatsu, K. Takeda, Y. Sagisaka, S. Katagiri, H. Kawabara, and K. Shikano, ATR Japanese Speech Database as a Tool of Speech Recognition and Synthesis, Speech Communication, vol.9, pp.357 363, 990. 30) A.Mase, K.Oura, Y.Nankaku, and K.Tokuda, HMM-based Singing Voice Synthesis System using Pitch-Shifted Pseudo Training Data, Proc.of Interspeech, 200 (to be published). 3) K. Shinoda and T. Watanabe, MDL-based Context-Dependent Subword Modeling for Speech Recognition, J.Acoust.Soc.Jpn.(E), vol.2, no.2, pp.79 86, 2000. 32),,,, HMM,, vol.i, 2-7-8, pp.347 348, 200. 33) K. Oura, H. Zen, Y. Nankaku, A. Lee, and K. Tokuda, A Fully Consistent Hidden Semi- Markov Model-Based Speech Recognition System, Proc.of IEICE Trans., vol.e9-d, no., pp.2693 2700, 208. 34) H. Zen, T. Masuko, K. Tokuda, T. Kobayashi, and T. Kitamura, A Hidden Semi-Markov Model-Based Speech Synthesis System, Proc.of IEICE Trans., vol.90-d, no.5, pp.825 834, 2007. 35) J.Yamagishi, M.Tachibana, T.Masuko, and T.Kobayashi, Speaking Style Adaptation Using Context Clustering Decision Tree for HMM-based Speech Synthesis, Proc.of ICASSP 2004, pp.5 8, 2004. 36) J.Sundberg, The Science of the Singing Voice, Northern Illinois University Press, 987. 37) C.E.Seashore, A Musical Ornament, the Vibrato, Proc.of Psychology of Music, McGrew- Hill Book Company, pp.33 52, 938. 38),,,, RWC :,, vol.45, no.3, pp.728 738, 2004. 8 c 200 Information Processing Society of Japan