: TANDEM-STRAIGHT. Make singing voice tangible: TANDEM-STRAIGHT and temporally variable morphing as substrate. Hideki Kawahara 1 and Masanori Morise 2

Σχετικά έγγραφα
Vol.4-DCC-8 No.8 Vol.4-MUS-5 No.8 4// 3 3 Hanning (T ) 3 Hanning 3T (y(t)w(t)) dt =.5 T y (t)dt. () STRAIGHT F 3 TANDEM-STRAIGHT[] 3 F F 3 [] F []. :

VOCODER VOCODER Vocal

Signal processing for handling singing voice texture

SNR F0 [2], [3], [4] F0 F0 F0 F0 F0 TUSK F0 TUSK F0 6 TUSK 6 F0 2. F0 F0 [5] [6] [7] p[8] Cepstrum [9], [10] [11] [12] [13] F0 [14] F0 [15] DIO[16] [1

LSP (Line Spectrum Pair, LSF: Line Spectrum Frequencies) [19] [20] LSP CODE [21], [22] VOCODER [23] 2. 3 STRAIGHT VOCODER STRAIGHT [24] STRAIGHT [25]

Fourier transform, STFT 5. Continuous wavelet transform, CWT STFT STFT STFT STFT [1] CWT CWT CWT STFT [2 5] CWT STFT STFT CWT CWT. Griffin [8] CWT CWT

Main source: "Discrete-time systems and computer control" by Α. ΣΚΟΔΡΑΣ ΨΗΦΙΑΚΟΣ ΕΛΕΓΧΟΣ ΔΙΑΛΕΞΗ 4 ΔΙΑΦΑΝΕΙΑ 1

1,a) 1,b) 2 3 Sakriani Sakti 1 Graham Neubig 1 1. A Study on HMM-Based Speech Synthesis Using Rich Context Models

v.connect 2 v.connect : A Singing Synthesis System Enabling Users to Control Vocal Tones Makoto Ogawa, 1 Syunji Yazaki 1 and Kôki Abe 1 VOCALOID

MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)

Ψηφιακή Επεξεργασία Φωνής

ΣΧΟΛΗ Σχολή Τεχνολογικών Εφαρμογών ΤΜΗΜΑ Ηλεκτρονικών Μηχανικών Τ.Ε. ΕΠΙΠΕΔΟ ΣΠΟΥΔΩΝ Προπτυχιακό ΚΩΔΙΚΟΣ ΜΑΘΗΜΑΤΟΣ ΕΞΑΜΗΝΟ ΣΠΟΥΔΩΝ 5

Spectrum Representation (5A) Young Won Lim 11/3/16

Sampling Basics (1B) Young Won Lim 9/21/13

F0 Estimation of Melody and Bass Lines in Real-world Musical Audio Signals

ECE 308 SIGNALS AND SYSTEMS FALL 2017 Answers to selected problems on prior years examinations

6.003: Signals and Systems. Modulation

ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ. του Φοιτητή του Τμήματος Ηλεκτρολόγων Μηχανικών και Τεχνολογίας Υπολογιστών της Πολυτεχνικής Σχολής του Πανεπιστημίου Πατρών

Sinsy: HMM. Sinsy An HMM-based singing voice synthesis system which can realize your wish I want this person to sing my song

GUI

Assignment 1 Solutions Complex Sinusoids

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

Re-Pair n. Re-Pair. Re-Pair. Re-Pair. Re-Pair. (Re-Merge) Re-Merge. Sekine [4, 5, 8] (highly repetitive text) [2] Re-Pair. Blocked-Repair-VF [7]

Non-negative Matrix Factorization, NMF [5] NMF. [1 3] Bregman [4] Harmonic-Temporal Clustering, HTC [2,3] 1,2,b) NTT

Διπλωματική Εργασία. του φοιτητή του Τμήματος Ηλεκτρολόγων Μηχανικών και Τεχνολογίας Υπολογιστών της Πολυτεχνικής Σχολής του Πανεπιστημίου Πατρών

Second Order RLC Filters

ECE 468: Digital Image Processing. Lecture 8

Ψηφιακή Επεξεργασία Φωνής

Feasible Regions Defined by Stability Constraints Based on the Argument Principle

Ψηφιακή Επεξεργασία Φωνής

Section 8.3 Trigonometric Equations

Fundamentals of Signals, Systems and Filtering

Development of the Nursing Program for Rehabilitation of Woman Diagnosed with Breast Cancer

Probability and Random Processes (Part II)

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

Bundle Adjustment for 3-D Reconstruction: Implementation and Evaluation

Ψηφιακή Επεξεργασία Φωνής

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

C F E E E F FF E F B F F A EA C AEC

Higher-Order Correlation Analysis of Pitch Fluctuations in Sustained Normal Vowels by the Method of Surrogate Data

Matrices and Determinants

Schedulability Analysis Algorithm for Timing Constraint Workflow Models

Acoustic Signal Adjustment by Considering Musical Expressive Intention Using a Performance Intension Function

CSJ. Speaker clustering based on non-negative matrix factorization using i-vector-based speaker similarity

Elements of Information Theory

Vol. 31,No JOURNAL OF CHINA UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

Σύστημα ψηφιακής επεξεργασίας ακουστικών σημάτων με χρήση προγραμματιζόμενων διατάξεων πυλών. Πτυχιακή Εργασία. Φοιτητής: ΤΣΟΥΛΑΣ ΧΡΗΣΤΟΣ

BandPass (4A) Young Won Lim 1/11/14

2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems

Speech Recognition using Phase Information based on Long-Term Analysis

Singing Information Processing: Music Information Processing for Singing Voices

Estimation, Evaluation and Guarantee of the Reverberant Speech Recognition Performance based on Room Acoustic Parameters

Ψηφιακή Επεξεργασία Φωνής

Development of a Seismic Data Analysis System for a Short-term Training for Researchers from Developing Countries

Design and Fabrication of Water Heater with Electromagnetic Induction Heating

ΔΙΑΚΡΙΤΟΣ ΜΕΤΑΣΧΗΜΑΤΙΣΜΟΣ FOURIER - Discrete Fourier Transform - DFT -

Διάλεξη 6. Fourier Ανάλυση Σημάτων. (Επανάληψη Κεφ Κεφ. 10.3, ) Ανάλυση σημάτων. Τι πρέπει να προσέξουμε

Echo path identification for stereophonic acoustic echo cancellation without pre-processing

Study of In-vehicle Sound Field Creation by Simultaneous Equation Method

HOMEWORK#1. t E(x) = 1 λ = (b) Find the median lifetime of a randomly selected light bulb. Answer:

Supplementary Appendix

Yoshifumi Moriyama 1,a) Ichiro Iimura 2,b) Tomotsugu Ohno 1,c) Shigeru Nakayama 3,d)

CHAPTER 101 FOURIER SERIES FOR PERIODIC FUNCTIONS OF PERIOD

Voice Conversion based on Non-negative Matrix Factorization with Segment Features in Noisy Environments

A study of geometric dependency of cepstrum on vocal tract length

Ψηφιακή Επεξεργασία Φωνής

encouraged to use the Version of Record that, when published, will replace this version. The most /BCJ BIOCHEMICAL JOURNAL

Scrum framework: Ρόλοι

GPGPU. Grover. On Large Scale Simulation of Grover s Algorithm by Using GPGPU

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

5.4 The Poisson Distribution.

Potential Dividers. 46 minutes. 46 marks. Page 1 of 11

Οι απόψεις και τα συμπεράσματα που περιέχονται σε αυτό το έγγραφο, εκφράζουν τον συγγραφέα και δεν πρέπει να ερμηνευτεί ότι αντιπροσωπεύουν τις

derivation of the Laplacian from rectangular to spherical coordinates

Διπλωματική Εργασία του φοιτητή του Τμήματος Ηλεκτρολόγων Μηχανικών και Τεχνολογίας Υπολογιστών της Πολυτεχνικής Σχολής του Πανεπιστημίου Πατρών

Τμήμα Μηχανικών Η/Υ και Πληροφορικής

Data sheet Thick Film Chip Resistor 5% - RS Series 0201/0402/0603/0805/1206

ECE Spring Prof. David R. Jackson ECE Dept. Notes 2

Applying Markov Decision Processes to Role-playing Game

Detection and Recognition of Traffic Signal Using Machine Learning

HIV HIV HIV HIV AIDS 3 :.1 /-,**1 +332

Digital Signal Octave Codes (0B)

Buried Markov Model Pairwise

ΓΕΩΠΟΝΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙO ΑΘΗΝΩΝ ΤΜΗΜΑ ΑΞΙΟΠΟΙΗΣΗΣ ΦΥΣΙΚΩΝ ΠΟΡΩΝ & ΓΕΩΡΓΙΚΗΣ ΜΗΧΑΝΙΚΗΣ

Research on Economics and Management

Synthesis of Imines from Amines in Aliphatic Alcohols on Pd/ZrO 2 Catalyst at Ambient Conditions

14.5mm 14.5mm

{takasu, Conditional Random Field

Numerical Analysis FMN011

Thermistor (NTC /PTC)

Thin Film Chip Resistors

A Method of Trajectory Tracking Control for Nonminimum Phase Continuous Time Systems

BCI On Feature Extraction from Multi-Channel Brain Waves Used for Brain Computer Interface

2 ~ 8 Hz Hz. Blondet 1 Trombetti 2-4 Symans 5. = - M p. M p. s 2 x p. s 2 x t x t. + C p. sx p. + K p. x p. C p. s 2. x tp x t.

Parameter Estimation of Mixture Model of Multiple Instruments and Application to Musical Instrument Identification

MathCity.org Merging man and maths

Graded Refractive-Index

Instruction Execution Times

Transcript:

Vol.1-MUS-86 No.6 1/7/8 1. : TANDEM-STRAIGHT 1 STRAIGHT TANDEM-STRAIGHT STRAIGHT TANDEM-STRAIGHT SNR 3 db Make singing voice tangible: TANDEM-STRAIGHT and temporally variable morphing as substrate Hideki Kawahara 1 and Masanori Morise Algorithms and implementation details are introduced for latest TANDEM- STRAIGHT and temporally variable multi-aspect speech morphing, based on introduction of motivations behind the legacy-straight and following developments. STRAIGHT and TANDEM-STRAIGHT intentionally destroy phase information in the original input speech. This destruction yields extremely poor SNR value ( 3 db) when they are evaluated as waveform coding methods. This article tries to illustrate views on prospective merits which this destruction provides in return. The authors introduced those views in the hope that readers of this article would be able to find interesting hints for their applications. STRAIGHT 1) ) STRAIGHT 3) 4),5) TANDEM-STRAIGHT 6) TANDEM- STRAIGHT 7) 8) 1 sound spectrogram 1) pattern playback 11) Voder channel vocoder 1) 13) 3 15) LPC 16) 4 vocoder vocoder SNR 3 db 1 Wakayama University Ritsumeikan University 1 3 9) 1989 NTT VOCODER VOCODER 3 CAPTCHA 14) 4 1 c 1 Information Processing Society of Japan

SNR (phase deaf), 15)16) 17)18) STRAIGHT Vocoder 1. STRAIGHT TANDEM-STRAIGHT.1 Fourier TANDEM 19) TANDEM x(t) k = x(t) = e jkω t + αe j((k+1)ω t+β) α, β ω = πf = π/t f (1) Fourier W (ω) 3 P (ω, t) k = P (ω, t) = W (ω) + α W (ω ω ) + W (ω)w (ω ω ) cos(ω t + β), () T T / P T (ω, t) = 1 [ ( P ω, t T 4 ) ( + P ω, t + T 4 )]. (3) P T (ω, t) TANDEM.1.1 P T (ω, t) 4.5T Blackman η dbt T 1 η dbt = L(ω, t) L(ω) dt dω πt L(ω) T 1 L(ω) = L(ω, t) dt, L(ω, t) = 1 log T 1 P (ω, t), (5) X X Vol.1-MUS-86 No.6 1/7/8 1 TANDEM T SNR σ t η dbt SNR 3 db Blackman Hanning Kaiser β = 9 1) (4) 1 STRAIGHT STRAIGHT Matlab code 3 4 ) c 1 Information Processing Society of Japan

1 Temporal variation of logarithmic power spectra under different SNR. (left) original time windows. (right) TANDEM windows. The SNR is 3 db Nuttall ) Blackman.5T σ t =.388 TANDEM 1/1 Cepstrum T 1% 4 4 cent TANDEM T N = Welch 3). TANDEM f 1/f = T f P S (ω, t) P S (ω, t) = 1 ω ω ω P T (ω λ) (6) P S(ω, t) antialiasing filter A/D D/A consistent sampling 4) D/A q k P S(ω, t) P ST (ω, t) P ST (ω, t) = k= q k P S (ω kω, t) (7) q k h(ω) W (ω) Q(z) = 1 R(z) = 1 = r k z k r k = k= k= h(ω kω ) W ( ω) dω, q k z k (8) h(ω).5t Blackman r k q k k k <..1 Vol.1-MUS-86 No.6 1/7/8 q 1 q 1 P ST (ω, t) P ST (ω, t) 1 TANDEM x 1 log(1 + x) x 1 3 c 1 Information Processing Society of Japan

P ST (ω, t) 6 7 L S (ω, t) = 1 ω ω ω log (P T (ω λ)) (9) P ST (ω, t) = exp (q L S(ω) + q 1(L S(ω ω, t) + L S(ω + ω, t))) (1) q 1 = q 1 9 1 cepstrum llifter P ST (ω, t) STRAIGHT 1.3 STRAIGHT TANDEM-STRAIGHT ) 5) 3. STRAIGHT 6) 7) 3 3) v.morish 31) 4 5 4. 3) 4.1 xa ( ( )) xa ( ) rab dtam (λ) dtab (λ) T Am (x A ) = exp log =, (11) Vol.1-MUS-86 No.6 1/7/8 A, B x A, x B A B T BA (x A ) m B B A r BA 1 A 1 q k Taylor q k k < 3 8) 139 Flash 9) 4 v.morish 3) 5 4 c 1 Information Processing Society of Japan

r AB, 1 4. v.morish r BA(t s) t s t s A T sa(t s) B T sb(t s) t s = T sa(t s) = T sb (t s ) = ts ts ts, (1) ( dtab(t sa(λ)) ( dtba (T sb (λ)) ) r (t) AB (λ), (13) ) (r (t) AB (λ) 1), (14) t s T sa(t s), T sb(t s) Θ m (t s ) = (1 r AB (t s ))Θ A (T sa (t s )) + r AB (t s )Θ B (T sb (t s )). (15) Θ(t) t r(t) 4.3 t r t r r (t) r AB (t r ) T ra(t r), T rb(t r) T rs (t r ) t s 7) 5. GUI TANDEM-STRAIGHT Matlab 1 6. TANDEM-STRAIGHT substrate substratum Matlab 34) SNR SNR STRAIGHT TANDEM-STRAIGHT CrestMuse (A)1917 Vol.1-MUS-86 No.6 1/7/8 1) Kawahara, H., Masuda-Katsuse, I. and de Cheveigné, A.: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F extraction, Speech Communication, Vol.7, No.3-4, pp.187 7 (1999). ) Vocoder 1 STRAIGHT STRAIGHT 33) GUI 8) 5 c 1 Information Processing Society of Japan

STRAIGHT Vol.63, No.8, pp.44 449 (7). 3) Kawahara, H. and Matsui, H.: Auditory morphing based on an elastic perceptual distance metric in an interference-free time-frequency representation,, ICASSP 3, Vol.I, pp.56 59 (3). 4) Kawahara, H., Katayose, H., de Cheveigné, A. and Patterson, R.D.: Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F and periodicity, EUROSPEECH 99, Vol.6, pp.781 784 (1999). 5) Kawahara, H., de Cheveigné, A., Banno, H., Takahashi, T. and Irino, T.: Nearly defect-free F trajectory extraction for expressive speech modifications based on STRAIGHT, Interspeech 5, pp.537 54 (5). 6) Kawahara, H., Morise, M., Takahashi, T., Nisimura, R., Irino, T. and Banno, H.: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F and aperiodicity estimation, ICASSP 8, pp.3933 3936 (8). 7) Kawahara, H., Nisimura, R., Irino, T., Morise, M., Takahashi, T. and Banno, B.: Temporally variable multi-aspect auditory morphing enabling extrapolation without objective and perceptual breakdown, ICASSP9, pp.395 398 (9). 8) Kawahara, H., Takahashi, T., Morise, M. and Banno, H.: Development of exploratory research tools based on TANDEM-STRAIGHT, APSIPA 9, pp.111 1 (9). 9) Vol.H- 87-1 (1987). 1) Koenig, W., Dunn, H.K. and Lacy, L.Y.: The sound spectrograph, J. Acoust. Soc. Am., Vol.18, No.1, pp.19 49 (1946). 11) Liberman, A.M., Delattre, P.C. and Cooper, F.S.: The rôle of selected stimulusvariables in the perception of the unvoiced stop consonants, American Journal of Psychology, Vol.65, pp.497 516 (195). 1) Dudley, H.: Remaking Speech, J. Acoust. Soc. Am., Vol. 11, No., pp. 169 177 (1939). 13) Vol.61, No.5, pp. 63 68 (5). 14) CAPTCHA No.3-4-3, p.11 (1). 15) A Vol.53-A, No.1, pp.35 4 (197). 16) Atal, B.S. and Hanauer, S.L.: Speech analysis and synthesis by linear prediction of the speech wave, J. Acoust. Soc. Am., Vol.5, No.B, pp.637 655 (1971). 17) Plomp, R. and Steeneken, H. J.M.: Effect of Phase on the Timbre of Complex Vol.1-MUS-86 No.6 1/7/8 Tones, J. Acoust. Soc. Am., Vol.46, No.B, pp.49 41 (1969). 18) Patterson, R.D.: The sound of a sinusoid: Spectral models, J. Acoust. Soc. Am., Vol.96, No.3, pp.149 1418 (1994). 19) D Vol.J 9-D, No.1, pp.365 367 (7). ) (1). 1.7.17. 1) Harris, F.J.: On the use of windows for harmonic analysis with the discrete Fourier transform, Proceedings of the IEEE, Vol.66, No.1, pp.51 83 (1978). ) Nuttall, A.H.: Some windows with very good sidelobe behavior, IEEE Trans. Audio Speech and Signal Processing, Vol.9, No.1, pp.84 91 (1981). 3) Welch, P.: The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms, IEEE Trans. Audio and Electroacoustics, Vol.15, No., pp.7 73 (1967). 4) Unser, M.: Sampling 5 Years After Shannon, Proceedings of the IEEE, Vol.88, No.4, pp.569 587 (). 5) H-1-44 Vol.4, No.3, pp.31 36 (1). 6) Schweinberger, S. R., Casper, C., Hauthal, N., Kaufmann, J. M., Kawahara, H., Kloth, N., Robertson, D.M., Simpson, A.P. and Zaeske, R.: Auditory Adaptation in Voice Perception, Current Biology, Vol.18, No.9, pp.684 688 (8). 7) Yonezawa, T., Suzuki, N., Abe, S., Mase, K. and Kogure, K.: Perceptual continuity and naturalness of expressive strength in singing voices based on speech morphing, EURASIP Journal on Audio, Speech, and Music Processing, No.3 (7). 8) : 5.4.15 5.8.15. 9) : http://www.wakayama-u.ac.jp/%7ekawahara/miraikandemo/straightmorph.swf. 3) Vol.48, No.1, pp.3637 3648 (7). 31) Morise, M., Onishi, M., Kawahara, H. and Katayose, H.: v.morish 9: A morphingbased singing design interface for vocal melodies, Lecture Note in Computer Science, No.LNCS 579, pp.185 19 (9). 3) : http://www.nicovideo.jp/watch/sm47471. 33) : http://www.wakayama-u.ac.jp/%7ekawahara/straightadv/index j.html. 34) No.MUS86-6 (1). 6 c 1 Information Processing Society of Japan