Vol.4-DCC-8 No.8 Vol.4-MUS-5 No.8 4// 3 3 Hanning (T ) 3 Hanning 3T (y(t)w(t)) dt =.5 T y (t)dt. () STRAIGHT F 3 TANDEM-STRAIGHT[] 3 F F 3 [] F []. :

Σχετικά έγγραφα
SNR F0 [2], [3], [4] F0 F0 F0 F0 F0 TUSK F0 TUSK F0 6 TUSK 6 F0 2. F0 F0 [5] [6] [7] p[8] Cepstrum [9], [10] [11] [12] [13] F0 [14] F0 [15] DIO[16] [1

VOCODER VOCODER Vocal

: TANDEM-STRAIGHT. Make singing voice tangible: TANDEM-STRAIGHT and temporally variable morphing as substrate. Hideki Kawahara 1 and Masanori Morise 2

Signal processing for handling singing voice texture

Fourier transform, STFT 5. Continuous wavelet transform, CWT STFT STFT STFT STFT [1] CWT CWT CWT STFT [2 5] CWT STFT STFT CWT CWT. Griffin [8] CWT CWT

LSP (Line Spectrum Pair, LSF: Line Spectrum Frequencies) [19] [20] LSP CODE [21], [22] VOCODER [23] 2. 3 STRAIGHT VOCODER STRAIGHT [24] STRAIGHT [25]

MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)

1,a) 1,b) 2 3 Sakriani Sakti 1 Graham Neubig 1 1. A Study on HMM-Based Speech Synthesis Using Rich Context Models

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

GUI

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

Ψηφιακή Επεξεργασία Φωνής

Ψηφιακή Επεξεργασία Φωνής

Ψηφιακή Επεξεργασία Φωνής

Ψηφιακή Επεξεργασία Φωνής

Non-negative Matrix Factorization, NMF [5] NMF. [1 3] Bregman [4] Harmonic-Temporal Clustering, HTC [2,3] 1,2,b) NTT

Buried Markov Model Pairwise

F0 Estimation of Melody and Bass Lines in Real-world Musical Audio Signals

Re-Pair n. Re-Pair. Re-Pair. Re-Pair. Re-Pair. (Re-Merge) Re-Merge. Sekine [4, 5, 8] (highly repetitive text) [2] Re-Pair. Blocked-Repair-VF [7]

Voice Conversion based on Non-negative Matrix Factorization with Segment Features in Noisy Environments

(hidden Markov model: HMM) FUNDAMENTALS OF SPEECH SYNTHESIS BASED ON HMM. Keiichi Tokuda. Department of Computer Science

Sinsy: HMM. Sinsy An HMM-based singing voice synthesis system which can realize your wish I want this person to sing my song

Estimation, Evaluation and Guarantee of the Reverberant Speech Recognition Performance based on Room Acoustic Parameters

Retrieval of Seismic Data Recorded on Open-reel-type Magnetic Tapes (MT) by Using Existing Devices

Ψηφιακή Επεξεργασία Φωνής

[5] F 16.1% MFCC NMF D-CASE 17 [5] NMF NMF 3. [5] 1 NMF Deep Neural Network(DNN) FUSION 3.1 NMF NMF [12] S W H 1 Fig. 1 Our aoustic event detect

Sampling Basics (1B) Young Won Lim 9/21/13

v.connect 2 v.connect : A Singing Synthesis System Enabling Users to Control Vocal Tones Makoto Ogawa, 1 Syunji Yazaki 1 and Kôki Abe 1 VOCALOID

CSJ. Speaker clustering based on non-negative matrix factorization using i-vector-based speaker similarity

Speech Recognition using Phase Information based on Long-Term Analysis

Research on mode-locked optical fiber laser

Ψηφιακή Επεξεργασία Φωνής

[2] REVERB 8 [3], [4] [5] [20] [6], [7], [8], [9], [10] [11] REVERB 8 *1 [9] LDA *2 MLLT (SAT) [8] (basis fmllr) [12] (DNN) [10] DNN [11] [13] [14] Ka

ΠΡΟΗΓΜΕΝΑ ΣΥΣΤΗΜΑΤΑ ΜΕΤΑΦΟΡΩΝ

Singing Information Processing: Music Information Processing for Singing Voices

BCI On Feature Extraction from Multi-Channel Brain Waves Used for Brain Computer Interface

A study of geometric dependency of cepstrum on vocal tract length

1181 (real-timespeechdriven) 1 1 ( ) D FAP FAP (voiceactivationdetectionvad) D FaceGen 3- D XfaceEd MPEG-4 1 FAP 66 FAP ( ) FAP 84

( ) (Harmonic-Temporal Clustering; HTC) [1], [2] ( ) ( ) [4] HTC. (Non-negative Matrix Factorization; NMF) [3] [5], [6] [7], [8]

Probability and Random Processes (Part II)

Spectrum Representation (5A) Young Won Lim 11/3/16

BandPass (4A) Young Won Lim 1/11/14

το προφίλ της γάστρας, η ίσαλος σχεδίασης, η καμπύλη εμβαδών εγκαρσίων τομών και η κατανομή του κέντρου βάρους των εγκαρσίων τομών κατά μήκος του

Higher-Order Correlation Analysis of Pitch Fluctuations in Sustained Normal Vowels by the Method of Surrogate Data

Συνδυασμένη Οπτική-Ακουστική Ανάλυση Ομιλίας

Περιεχόµενα. ΕΠΛ 422: Συστήµατα Πολυµέσων. Μέθοδοι συµπίεσης ηχητικών. Βιβλιογραφία. Κωδικοποίηση µε βάση την αντίληψη.

Query by Phrase (QBP) (Music Information Retrieval, MIR) QBH QBP / [1, 2] [3, 4] Query-by-Humming (QBH) QBP MIDI [5, 6] [8 10] [7]

Ερευνητική+Ομάδα+Τεχνολογιών+ Διαδικτύου+

ΣΧΟΛΗ Σχολή Τεχνολογικών Εφαρμογών ΤΜΗΜΑ Ηλεκτρονικών Μηχανικών Τ.Ε. ΕΠΙΠΕΔΟ ΣΠΟΥΔΩΝ Προπτυχιακό ΚΩΔΙΚΟΣ ΜΑΘΗΜΑΤΟΣ ΕΞΑΜΗΝΟ ΣΠΟΥΔΩΝ 5

Stabilization of stock price prediction by cross entropy optimization

Assignment 1 Solutions Complex Sinusoids

Feasible Regions Defined by Stability Constraints Based on the Argument Principle

Μουσική Πληροφορική. Αλέξανδρος Ελευθεριάδης Αναπ. Καθηγητής Τµήµα Πληροφορικής και Τηλεπικοινωνιών Εθνικό και Καποδιστριακό Πανεπιστήµιο Αθηνών

Nov Journal of Zhengzhou University Engineering Science Vol. 36 No FCM. A doi /j. issn

Fundamentals of Signals, Systems and Filtering

Scrub Nurse Robot: SNR. C++ SNR Uppaal TA SNR SNR. Vain SNR. Uppaal TA. TA state Uppaal TA location. Uppaal

1 (forward modeling) 2 (data-driven modeling) e- Quest EnergyPlus DeST 1.1. {X t } ARMA. S.Sp. Pappas [4]

Εξάλειψη αντήχησης από ηχητικά σήματα με υποκειμενικά / ψυχοακουστικά κριτήρια

Detection and Recognition of Traffic Signal Using Machine Learning

Speeding up the Detection of Scale-Space Extrema in SIFT Based on the Complex First Order System

A Method of Trajectory Tracking Control for Nonminimum Phase Continuous Time Systems

ΔΙΠΛΩΜΑΤΙΚΕΣ ΕΡΓΑΣΙΕΣ

Ένα µοντέλο Ισοδύναµης Χωρητικότητας για IEEE Ασύρµατα Δίκτυα. Εµµανουήλ Καφετζάκης

Main source: "Discrete-time systems and computer control" by Α. ΣΚΟΔΡΑΣ ΨΗΦΙΑΚΟΣ ΕΛΕΓΧΟΣ ΔΙΑΛΕΞΗ 4 ΔΙΑΦΑΝΕΙΑ 1

Vocal Dynamics Controller:

Acoustic Signal Adjustment by Considering Musical Expressive Intention Using a Performance Intension Function

ΑΝΤΩΝΙΑΔΗΣ ΝΙΚΟΛΑΟΣ & 1991 Υποτροφία του Κοινωφελούς Ιδρύματος "Αλ. Σ. Ωνάσης" για μεταπτυχιακές σπουδές.

Ψηφιακή Επεξεργασία Φωνής

ΣΤΟΧΑΣΤΙΚΑ ΣΥΣΤΗΜΑΤΑ & ΕΠΙΚΟΙΝΩΝΙΕΣ 1o Τμήμα (Α - Κ): Αμφιθέατρο 3, Νέα Κτίρια ΣΗΜΜΥ Ανάλυση Επικοινωνιακών Σημάτων κατά Fourier

Digital Signal Octave Codes (0B)

ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ. του Φοιτητή του Τμήματος Ηλεκτρολόγων Μηχανικών και Τεχνολογίας Υπολογιστών της Πολυτεχνικής Σχολής του Πανεπιστημίου Πατρών

Optimization, PSO) DE [1, 2, 3, 4] PSO [5, 6, 7, 8, 9, 10, 11] (P)

EM Baum-Welch. Step by Step the Baum-Welch Algorithm and its Application 2. HMM Baum-Welch. Baum-Welch. Baum-Welch Baum-Welch.

Supplementary Appendix

Analysis of prosodic features in native and non-native Japanese using generation process model of fundamental frequency contours

Summary of the model specified

Διάλεξη 6. Fourier Ανάλυση Σημάτων. (Επανάληψη Κεφ Κεφ. 10.3, ) Ανάλυση σημάτων. Τι πρέπει να προσέξουμε

070-A

Anomaly Detection with Neighborhood Preservation Principle

Παραμετρική ανάλυση του συντελεστή ανάκλασης από στρωματοποιημένο πυθμένα δύο στρωμάτων με επικλινή διεπιφάνεια 1

ΕΥΡΕΣΗ ΤΟΥ ΔΙΑΝΥΣΜΑΤΟΣ ΘΕΣΗΣ ΚΙΝΟΥΜΕΝΟΥ ΡΟΜΠΟΤ ΜΕ ΜΟΝΟΦΘΑΛΜΟ ΣΥΣΤΗΜΑ ΟΡΑΣΗΣ

Area Location and Recognition of Video Text Based on Depth Learning Method

1 n-gram n-gram n-gram [11], [15] n-best [16] n-gram. n-gram. 1,a) Graham Neubig 1,b) Sakriani Sakti 1,c) 1,d) 1,e)


Elements of Information Theory

Q L -BFGS. Method of Q through full waveform inversion based on L -BFGS algorithm. SUN Hui-qiu HAN Li-guo XU Yang-yang GAO Han ZHOU Yan ZHANG Pan

HDD. Point to Point. Fig. 1: Difference of time response by location of zero.

{takasu, Conditional Random Field

Άσκηση 10, σελ Για τη μεταβλητή x (άτυπος όγκος) έχουμε: x censored_x 1 F 3 F 3 F 4 F 10 F 13 F 13 F 16 F 16 F 24 F 26 F 27 F 28 F

Χαρακτηρισµός Νεοπλασµάτων στη Μαστογραφία από το Σχήµα της Παρυφής µε χρήση Νευρωνικών ικτύων

Vol. 31,No JOURNAL OF CHINA UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb

CorV CVAC. CorV TU317. 1

Maxima SCORM. Algebraic Manipulations and Visualizing Graphs in SCORM contents by Maxima and Mashup Approach. Jia Yunpeng, 1 Takayuki Nagai, 2, 1

Τιμοκατάλογος λιανικής

Echo path identification for stereophonic acoustic echo cancellation without pre-processing

Bundle Adjustment for 3-D Reconstruction: Implementation and Evaluation

7AAC63-90 SINGLE-PHASE MOTORS PERMANENT CAPACITOR 7AACC SINGLE-PHASE MOTORS CAP START - CAP RUN. squirrel cage induction motors 7AACC63-100

6.003: Signals and Systems. Modulation

JOURNAL OF APPLIED SCIENCES Electronics and Information Engineering. Cyclic MUSIC DOA TN (2012)

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

Transcript:

Vol.4-DCC-8 No.8 Vol.4-MUS-5 No.8 4//,a) Vocoder (F) F F. PSOLA [] sinusoidal model [] phase vocoder Vocoder [3] (F) F 3 [4], [5], [6], [7], [8], [9] [], [], [], [3], [4] [5], [6] [7], [8], University of Yamanashi, Yamanashi 4 85, Japan a) mmorise@yamanashi.ac.jp [9] STRAIGHT[] [] TANDEM-STRAIGHT[] WORLD[3] * [] [] F F [] SNR F F. * http://ml.cs.yamanashi.ac.jp/world/ c 4 Information Processing Society of Japan

Vol.4-DCC-8 No.8 Vol.4-MUS-5 No.8 4// 3 3 Hanning (T ) 3 Hanning 3T (y(t)w(t)) dt =.5 T y (t)dt. () STRAIGHT F 3 TANDEM-STRAIGHT[] 3 F F 3 [] F []. : F [3] Hanning nω (ω = π/t n ) nω Hz nω Hz. : F 3 3 3 P s (ω) = 3 ω 3 P (ω + λ)dλ, () ω ω 3 P (ω) ω /3 ω.3 3: c 4 Information Processing Society of Japan

3 [] nt nt sinc sinc 3 sinc consistent sampling [4] fs Hz fs / Hz * consistent sampling DA/AD DA/AD nω Hz AD nω Hz Hanning -3 db F ±ω Hz q P l (ω) = exp ( q log(p (ω)) + q log (P (ω + ω )P (ω ω ))), Hz ±ω * (3) Hz P l (ω) P l (ω) = exp (F [l s (τ)l q (τ)p s (τ)]), (4) l s (τ) = sin(πf τ), (5) πf τ ( ) πτ l q (τ) = q + q cos, (6) T p s (τ) = F [log (P s (ω))], (7) l s (τ) l q (τ) q q F[] F [] q.8 q.9 3. TANDEM-STRAIGHT F 3. E f σ f (n) = = N K N n= K k= σ f (n), (8) ( P e (k, n) K Vol.4-DCC-8 No.8 Vol.4-MUS-5 No.8 4// K l= P e (l, n)), (9) P e (k, n) = log (P l (k, n)) log (P t (k)), () P l (k, n) k n P t (k) c 4 Information Processing Society of Japan 3

K FFT N E f E t = K σ t (k) = N K k= N n= σ t (k), () ( P e (k, n) N N m= P e (k, m)). () E t E f E t 3. TANDEM-STRAIGHT [3] 3 F (Standard F) P t (k) 48 khz s ms N, FFT TANDEM-STRAIGHT F 4 Hz 3 4,96 K,48 F Standard F Hz Hz 3.3 : SNR db 6 db. db SNR Standard F: Hz 3 3 3 4 5 6 SNR (db) SNR Standard F Hz 3 SNR E f E t SNR Standard F TANDEM-STRAIGHT SNR 45 db TANDEM-STRAIGHT 3.4 : F Vol.4-DCC-8 No.8 Vol.4-MUS-5 No.8 4// F %.% SNR 6 db 4 5 STRAIGHT F ±3% TANDEM- TANDEM- STRAIGHT F TANDEM-STRAIGHT c 4 Information Processing Society of Japan 4

Vol.4-DCC-8 No.8 Vol.4-MUS-5 No.8 4// Standard F: Hz.5.5 Standard F: Hz 3 3 3 4 5 6 SNR (db) 3 SNR Standard F Hz.5.5 F estimation error (%) 4 F Standard F Hz *3 3.5 SNR 45 db TANDEM-STRAIGHT F SNR 45 db TANDEM-STRAIGHT 4. Cheap- *3 TANDEM-STRAIGHT F Trick F F TANDEM-STRAIGHT F STRAIGHT [] JSPS 4373 c 4 Information Processing Society of Japan 5

.5.5.5.5 Standard F: Hz F estimation error (%) 5 F Standard F Hz 65487 H5/A8 [] Moulines, E. and Charpentier, F.: Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones, Speech Communication, Vol. 9, No. 5-6, pp. 453 467 (99). [] McAulay, R. and Quatieri, T.: Speech analysis/synthesis based on a sinusoidal representation, IEEE Trans. Acoustics, Speech, and Signal Processing), Vol. 34, No. 4, pp. 744 755 (986). [3] Dudley, H.: Remaking speech, J. Acoust. Soc. Am., Vol., No., pp. 69 77 (939). [4] Noll, A. M.: Short-time spectrum and cepstrum techniques for vocal pitch detection, J. Acoust. Soc. Am., Vol. 36, No., pp. 96 3 (964). [5] Oppenheim, A. V.: Speech analysis-synthesis system based on homomorphic filtering, J. Acoust. Soc. Am., Vol. 45, No., pp. 458 465 (969). [6] Atal, B. S. and Hanauer, S. L.: Speech analysis and synthesis by linear prediction of the speech wave, J. Acoust. Soc. Am., Vol. 5, No. B, pp. 637 655 (969). [7] El-Jaroudi, A. and Makhoul, J.: Discrete all-pole modeling, IEEE Trans. on Signal Processing, Vol. 39, No., pp. 4 43 (99). [8] Badeau, R. and David, B.: Weighted maximum likelihood autoregressive and moving average spectrum modeling, in Proc. ICASSP 8, pp. 376 3764 (9). Vol.4-DCC-8 No.8 Vol.4-MUS-5 No.8 4// [9] Campedel-Oudot, M., Cappé, O. and Moulines, E.: Estimation of the spectral envelope of voiced sounds using a penalized likelihood approach, IEEE Trans. on Speech and Audio Processing, Vol. 9, No. 5, pp. 469 48 (). [] Kawahara, H., Masuda-Katsuse, I. and de Cheveigné, A.: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F extraction, Speech Communication, Speech Communication, Vol. 7, pp. 87 7 (999). [] Kawahara, H., Morise, M., Takahashi, T., Nisimura, R., Irino, T. and Banno, H.: TANDEM-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, f, and aperiodicity estimation, in Proc. ICASSP 8, pp. 3933 3936 (8). [] Kawahara, H. and Morise, M.: Technical foundations of TANDEM-STRAIGHT, a speech analysis, modification and synthesis framework, SADHANA - Academy Proceedings in Engineering Sciences, Vol. 36, pp. 73 78 (). [3] Morise, M.: An attempt to develop a singing synthesizer by collaborative creation, in Proc. SMAC 3, pp. 87 9 (3). [4] Nakano, T. and Goto, M.: A spectral envelope estimation method based on F-adaptive multi-frame integration analysis, in Proc. SAPA-SCALE, pp. 6 (). [5] Banno, H., Hata, H., Morise, M., Takahashi, T., Irino, T. and Kawahara, H.: Implementation of realtime STRAIGHT speech manipulation system, Acoust. Sci. & Tech., Vol. 8, No. 3, pp. 4 46 (7). [6] Morise, M., Onishi, M., Kawahara, H. and Katayose, H.: v.morish 9: A morphing-based singing design interface for vocal melodies, Lecture Notes in Computer Science, Vol. LNCS 579, pp. 85 9 (9). [7] Zen, H., Tokuda, K. and Black, A. W.: Statistical parametric speech synthesis, Speech Communication, Vol. 5, No., pp. 39 64 (9). [8] Zen, H., Senior, A. and Schuster, M.: Statistical parametric speech synthesis using deep neural networks, in Proc. ICASSP 3, pp. 796 7966 (3). [9] Toda, T. and Tokuda, K.: Statistical approach to vocal tract transfer function estimation based on factor analyzed trajectory HMM, in Proc. ICASSP 8, pp. 395 398 (8). [] Kawahara, H., Nisimura, R., Irino, T., Morise, M., Takahashi, T. and Banno, H.: Temporally variable multi-aspect auditory morphing enabling extrapolation without objective and perceptual breakdown, in Proc. ICASSP9, pp. 395 398 (9). [] Morise, M.:, a spectral envelope estimator for high-quality speech synthesis, Speech Communication (4). [] Kawahara, H., Morise, M., Banno, H., Nisimura, R. and Irino, T.: Excitation source analysis for high-quality speech manipulation systems based on an interferencefree representation of group delay with minimum phase response compensation, pp. xxx xxx (4). [3] Mathews, M. V., Miller, J. E. and David, E. E.: Pitch synchronous analysis of voiced sounds, J. Acoust. Soc. Am., Vol. 33, pp. 79 86 (96). [4] Unser, M.: Sampling 5 years after Shannon, Proc. of the IEEE, Vol. 88, pp. 569 587 (). c 4 Information Processing Society of Japan 6