LSP (Line Spectrum Pair, LSF: Line Spectrum Frequencies) [19] [20] LSP CODE [21], [22] VOCODER [23] 2. 3 STRAIGHT VOCODER STRAIGHT [24] STRAIGHT [25]

Σχετικά έγγραφα
Vol.4-DCC-8 No.8 Vol.4-MUS-5 No.8 4// 3 3 Hanning (T ) 3 Hanning 3T (y(t)w(t)) dt =.5 T y (t)dt. () STRAIGHT F 3 TANDEM-STRAIGHT[] 3 F F 3 [] F []. :

: TANDEM-STRAIGHT. Make singing voice tangible: TANDEM-STRAIGHT and temporally variable morphing as substrate. Hideki Kawahara 1 and Masanori Morise 2

VOCODER VOCODER Vocal

Signal processing for handling singing voice texture

SNR F0 [2], [3], [4] F0 F0 F0 F0 F0 TUSK F0 TUSK F0 6 TUSK 6 F0 2. F0 F0 [5] [6] [7] p[8] Cepstrum [9], [10] [11] [12] [13] F0 [14] F0 [15] DIO[16] [1

Fourier transform, STFT 5. Continuous wavelet transform, CWT STFT STFT STFT STFT [1] CWT CWT CWT STFT [2 5] CWT STFT STFT CWT CWT. Griffin [8] CWT CWT

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

1,a) 1,b) 2 3 Sakriani Sakti 1 Graham Neubig 1 1. A Study on HMM-Based Speech Synthesis Using Rich Context Models

MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)

Main source: "Discrete-time systems and computer control" by Α. ΣΚΟΔΡΑΣ ΨΗΦΙΑΚΟΣ ΕΛΕΓΧΟΣ ΔΙΑΛΕΞΗ 4 ΔΙΑΦΑΝΕΙΑ 1

Sampling Basics (1B) Young Won Lim 9/21/13

Spectrum Representation (5A) Young Won Lim 11/3/16

ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ. του Φοιτητή του Τμήματος Ηλεκτρολόγων Μηχανικών και Τεχνολογίας Υπολογιστών της Πολυτεχνικής Σχολής του Πανεπιστημίου Πατρών

Ψηφιακή Επεξεργασία Φωνής

Buried Markov Model Pairwise

6.003: Signals and Systems. Modulation

Retrieval of Seismic Data Recorded on Open-reel-type Magnetic Tapes (MT) by Using Existing Devices

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

BandPass (4A) Young Won Lim 1/11/14

Development of a Seismic Data Analysis System for a Short-term Training for Researchers from Developing Countries

EM Baum-Welch. Step by Step the Baum-Welch Algorithm and its Application 2. HMM Baum-Welch. Baum-Welch. Baum-Welch Baum-Welch.

Schedulability Analysis Algorithm for Timing Constraint Workflow Models

A study of geometric dependency of cepstrum on vocal tract length

2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems

Feasible Regions Defined by Stability Constraints Based on the Argument Principle

GUI

Research on mode-locked optical fiber laser

Fundamentals of Signals, Systems and Filtering

Development of the Nursing Program for Rehabilitation of Woman Diagnosed with Breast Cancer

CSJ. Speaker clustering based on non-negative matrix factorization using i-vector-based speaker similarity

Vol. 31,No JOURNAL OF CHINA UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb

Περιεχόµενα. ΕΠΛ 422: Συστήµατα Πολυµέσων. Μέθοδοι συµπίεσης ηχητικών. Βιβλιογραφία. Κωδικοποίηση µε βάση την αντίληψη.

Speech Recognition using Phase Information based on Long-Term Analysis

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

Digital Signal Octave Codes (0B)

No. 7 Modular Machine Tool & Automatic Manufacturing Technique. Jul TH166 TG659 A

ΑΚΑΔΗΜΙΑ ΕΜΠΟΡΙΚΟΥ ΝΑΥΤΙΚΟΥ ΜΑΚΕΔΟΝΙΑΣ ΣΧΟΛΗ ΜΗΧΑΝΙΚΩΝ

HIV HIV HIV HIV AIDS 3 :.1 /-,**1 +332

Ψηφιακή Επεξεργασία Φωνής

ΔΙΠΛΩΜΑΤΙΚΕΣ ΕΡΓΑΣΙΕΣ

Technical Research Report, Earthquake Research Institute, the University of Tokyo, No. +-, pp. 0 +3,,**1. No ,**1

[5] F 16.1% MFCC NMF D-CASE 17 [5] NMF NMF 3. [5] 1 NMF Deep Neural Network(DNN) FUSION 3.1 NMF NMF [12] S W H 1 Fig. 1 Our aoustic event detect

A summation formula ramified with hypergeometric function and involving recurrence relation

Study of In-vehicle Sound Field Creation by Simultaneous Equation Method

ECE 468: Digital Image Processing. Lecture 8

CT Correlation (2B) Young Won Lim 8/15/14

{takasu, Conditional Random Field

ΣΧΟΛΗ Σχολή Τεχνολογικών Εφαρμογών ΤΜΗΜΑ Ηλεκτρονικών Μηχανικών Τ.Ε. ΕΠΙΠΕΔΟ ΣΠΟΥΔΩΝ Προπτυχιακό ΚΩΔΙΚΟΣ ΜΑΘΗΜΑΤΟΣ ΕΞΑΜΗΝΟ ΣΠΟΥΔΩΝ 5

Calculating the propagation delay of coaxial cable

Elements of Information Theory


VSC STEADY2STATE MOD EL AND ITS NONL INEAR CONTROL OF VSC2HVDC SYSTEM VSC (1. , ; 2. , )

Διπλωματική Εργασία του φοιτητή του Τμήματος Ηλεκτρολόγων Μηχανικών και Τεχνολογίας Υπολογιστών της Πολυτεχνικής Σχολής του Πανεπιστημίου Πατρών

Τμήμα Μηχανικών Η/Υ και Πληροφορικής

Ελαφρές κυψελωτές πλάκες - ένα νέο προϊόν για την επιπλοποιία και ξυλουργική. ΒΑΣΙΛΕΙΟΥ ΒΑΣΙΛΕΙΟΣ και ΜΠΑΡΜΠΟΥΤΗΣ ΙΩΑΝΝΗΣ

ΤΟ ΜΟΝΤΕΛΟ Οι Υποθέσεις Η Απλή Περίπτωση για λi = μi 25 = Η Γενική Περίπτωση για λi μi..35

Development of Finer Spray Atomization for Fuel Injectors of Gasoline Engines

Acoustic Signal Adjustment by Considering Musical Expressive Intention Using a Performance Intension Function

CorV CVAC. CorV TU317. 1

Design and Fabrication of Water Heater with Electromagnetic Induction Heating

n 1 n 3 choice node (shelf) choice node (rough group) choice node (representative candidate)

Resurvey of Possible Seismic Fissures in the Old-Edo River in Tokyo

Simplex Crossover for Real-coded Genetic Algolithms

Study on the Strengthen Method of Masonry Structure by Steel Truss for Collapse Prevention

Speeding up the Detection of Scale-Space Extrema in SIFT Based on the Complex First Order System

Introduction to Time Series Analysis. Lecture 16.

Q L -BFGS. Method of Q through full waveform inversion based on L -BFGS algorithm. SUN Hui-qiu HAN Li-guo XU Yang-yang GAO Han ZHOU Yan ZHANG Pan

SMD Transient Voltage Suppressors

Estimation, Evaluation and Guarantee of the Reverberant Speech Recognition Performance based on Room Acoustic Parameters

Αγορά Ακινήτων και η ελληνική Κρίση

Ψηφιακή Επεξεργασία Φωνής

Sinsy: HMM. Sinsy An HMM-based singing voice synthesis system which can realize your wish I want this person to sing my song

Correction of chromatic aberration for human eyes with diffractive-refractive hybrid elements

Evolution of Novel Studies on Thermofluid Dynamics with Combustion

Stress Relaxation Test and Constitutive Equation of Saturated Soft Soil

Analysis of prosodic features in native and non-native Japanese using generation process model of fundamental frequency contours

Assignment 1 Solutions Complex Sinusoids

Content. Introduction... 1

*,* + -+ on Bedrock Bath. Hideyuki O, Shoichi O, Takao O, Kumiko Y, Yoshinao K and Tsuneaki G

ΜΕΛΕΤΗ ΚΑΙ ΠΡΟΣΟΜΟΙΩΣΗ ΙΑΜΟΡΦΩΣΕΩΝ ΣΕ ΨΗΦΙΑΚΑ ΣΥΣΤΗΜΑΤΑ ΕΠΙΚΟΙΝΩΝΙΩΝ.

Nov Journal of Zhengzhou University Engineering Science Vol. 36 No FCM. A doi /j. issn

IMES DISCUSSION PAPER SERIES

Applying Markov Decision Processes to Role-playing Game

v.connect 2 v.connect : A Singing Synthesis System Enabling Users to Control Vocal Tones Makoto Ogawa, 1 Syunji Yazaki 1 and Kôki Abe 1 VOCALOID

Statistical analysis of extreme events in a nonstationary context via a Bayesian framework. Case study with peak-over-threshold data

Motion analysis and simulation of a stratospheric airship

Voice Conversion based on Non-negative Matrix Factorization with Segment Features in Noisy Environments

Numerical Analysis FMN011

Monolithic Crystal Filters (M.C.F.)

High order interpolation function for surface contact problem

Δθαξκνζκέλα καζεκαηηθά δίθηπα: ε πεξίπησζε ηνπ ζπζηεκηθνύ θηλδύλνπ ζε κηθξνεπίπεδν.

ITU-R F (2011/04)

Web 論 文. Performance Evaluation and Renewal of Department s Official Web Site. Akira TAKAHASHI and Kenji KAMIMURA

Potential Dividers. 46 minutes. 46 marks. Page 1 of 11

(1) Describe the process by which mercury atoms become excited in a fluorescent tube (3)

[4] 1.2 [5] Bayesian Approach min-max min-max [6] UCB(Upper Confidence Bound ) UCT [7] [1] ( ) Amazons[8] Lines of Action(LOA)[4] Winands [4] 1

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ.

Πανεπιστήµιο Πειραιώς Τµήµα Πληροφορικής

Stabilization of stock price prediction by cross entropy optimization

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

Transcript:

一般社団法人 電子情報通信学会 信学技報 THE INSTITUTE OF ELECTRONICS, IEICE Technical Report INFORMATION AND COMMUNICATION ENGINEERS EA2017-4 (2017-07) [ ]VOCODER 640-8510 930 E-mail: kawahara@sys.wakayama-u.ac.jp 80 VOCODER VOCODER [Invited]Revisiting VOCODER Why I intentionally discard the original phase of the original speech? Hideki KAWAHARA Wakayama University 930 Sakae-dani, Wakayama, Wakayama, 640-8510 Japan E-mail: kawahara@sys.wakayama-u.ac.jp Abstract VOCODER is a framework invented for narrow band communication about 80 years ago. It has been providing a productive basis for speech research and applications. It also will play another productive roles in the age of deep learning, a rapidly expanding research and deployment framework. I would like to introduce a perspective on new roles of VOCODER, based on reviewing of research tools, which I developed and am currently developing. Key words speech, phase, spectrum, instantaneous frequency, group delay, sampling, deep learning 1. 1939 VOCODER [1] VOCODER 80 [2] VOCODER [3] [5] [6] VOCODER VOCODER 2. VOCODER VOCODER 2. 1 VOCODER [7], [8] [9] [10] pattern playback [11], [12] [13], [14] 2. 2 [15] VOCODER VOCODER LPC (Linear Predictive Coding) [16], PARCOR (PARtial autocorrelation) [14], [17], CSM(Composite Sinusoidal Modeling) [18], 21 This article is a technical report without peer review, and its polished and/or extended version may be published elsewhere. Copyright 2017 by IEICE

LSP (Line Spectrum Pair, LSF: Line Spectrum Frequencies) [19] [20] LSP CODE [21], [22] VOCODER [23] 2. 3 STRAIGHT VOCODER STRAIGHT [24] STRAIGHT [25] [26], [27] STRAIGHT VOCODER VOCODER STRAIGHT [28], [29] STRAIGHT TANDEM-STRAIGHT [30] [31] [33] [34], [35] STRAIGHT( ) [5], [36] STRAIGHT [37] [38] Google scholar STRAIGHT 2017 6 2,000 20 STRAIGHT WORLD [39] STRAIGHT [4]Mel cepstrum [40] 2. 4 WaveNet [2] WaveNet WaveNet μ-law [41] 256 VOCODER VOCODER [27] 1 Fig. 1 Demonstration movie for phase perception [6], [42], [43] VOCODER Google UK [44], [45] [46], [47] 3. SparkNG SparkNG [48], [49] 30 [50] SparkNG GUI 3. 1 [51] [52] [52] MATLAB 1 Schroeder [53] 1)cos 2)sin 3) sin cos 4)Schroeder 5)0 2π 1 1), 2), 3) [52] 50 Hz 400 Hz 20 db [54] 22

Fig. 3 3 Realtime visualization of the vocal tract shape. 2 ERB N number 1/3 Fig. 2 Time-frequency representation using non-linear frequency resolution. Upper image shows ERB N number-based representation. Lower image shows 1/3 octave-based representation. 3. 2 ERB N number [55] [56] 2 ERB N number 1/3 /aiueo/ ERB N number 1/3 FFT(Fast Fourier Transform) Bark [57] 3. 3 PARCOR SparkNG 3 PARCOR 3 MacBook Pro (Retina, 13- inch, 2.9GHz Intel Core i5) MATLAB (R2017a) 20 fps MATLAB 3 3. 4 SparkNG GUI 3. 4. 1 4 GUI GUI 4 GUI 23

Fig. 4 4 Filter manipulation GUI of the speech production simulator. 44,100 Hz LSP 3 3 3. 4. 2 Fant L-F 5 Fig. 5 Glottal source manipulation GUI of the speech production simulator. model [58] L-F model L-F model t p t p t a t c 4 5 GUI L-F model 3 L-F model (t p,t e,t a,t c) 5 t a +6 dboct modal, breathy, vocal fry [59] 3. 4. 3 L-F model L-F model [60] Fujisaki-Ljungqvist model [61] 24

[62] cos [46] cos 80 db [47] [44] L-F model VOCODER [44] [63] SparkNG 4. [42], [43] VOCODER VOCODER [3], [5] WaveNet VOCODER 16K12464 (B)15H02726 VOCODER ATR STRAIGHT [1] H. Dudley, Remaking Speech, The Journal of the Acoustical Society of America, vol.11, no.2, pp.169 177, 1939. [2] A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, WaveNet: A generative model for raw audio, arxiv preprint arxiv:1609.03499, pp.1 15, 2016. [3] Y.C. Eldar and T. Michaeli, Beyond bandlimited sampling, IEEE Signal Processing Magazine, vol.26, no.3, pp.48 68, may 2009. [4] vol.73 no.9 p. 2017 [ ] [5] 1 5 3 3 1, ( 15-May-2017) http://www.ieicehbkb.org/portal/ [6] M. Blaauw and J. Bonada, A neural parametric singing synthesizer, arxiv preprint arxiv:1704.03809, pp.1 9, apr 2017. http://arxiv.org/abs/1704.03809 [7] T. Chiba and M. Kajiyama, The Vowel, Its Nature and Structure, Tokyo-Kaiseikan, 1941. [8] vol.5 no.2 pp.15 30 2001 [9] G. Fant, Acoustic theory of speech production: with calculations based on X-ray studies of Russian articulations, vol.2, Walter de Gruyter, 1971. [originally, 1960, Mouton]. [10] [ ] Sona-Graph vol.11 no.1 pp.57 64 1955 [11] F.S. Cooper, A.M. Lieberman, and J.M. Borst, The interconversion of audible and visible patterns as a basis for research in the perception of speech, Proc. N. A. S., vol.37, pp.318 325, 1951. [12] pp.1 Q 28,429 430 2005 [13] C.G. Bell, H. Fujisaki, J.M. Heinz, K.N. Stevens, and A.S. House, Reduction of Speech Spectra by Analysis-by- Synthesis Techniques, The Journal of the Acoustical Society of America, vol.33, no.12, pp.1725 1736, 1961. [14] [ ] vol.19 no.7 pp.644 656 1978 [15] vol.53a no.1 pp.35 42 1970 [16] B.S. Atal and S.L. Hanauer, Speech analysis and synthesis by linear prediction of the speech wave, The Journal of the Acoustical Society of America, vol.50, no.2b, pp.637 655, 1971. [17] pp.2 2 6 1969 [18] vol.j64-a no.2 pp.105 112 1981 [19] (LSP) A vol.64 no.8 pp.599 606 1981 [20] vol.j83-a no.11 pp.1244 1255 2000 [21] M. Schroeder and B.S. Atal, Code-excited linear prediction (CELP): High-quality speech at very low bit rates, Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP 85., vol.10ieee, pp.937 940 1985. [22] ITU-T, G.729 : Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear prediction (CS-ACELP), 2012. [started 1996, In force 2012]. [23] A.S. Spanias, Speech coding: A tutorial review, Proceedings of the IEEE, vol.82, no.10, pp.1541 1582, 1994. [24] H. Kawahara, I. Masuda-Katsuse, and A. decheveigné, Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequencybased F0 extraction, Speech Communication, vol.27, no.3-4, pp.187 207, 1999. [25] A.S. Bregman, et al., Auditory scene analysis, vol.10, Cambridge, ma: mit press, 1990. [26] vocoder: Straight (< > ), vol.54 no.7 pp.521 526 1998 [27] Vocoder : straight, vol.63 no.8 pp.442 449 2007 [28] C. Liu and D. Kewley-Port, Vowel formant discrimination for high-fidelity speech, The Journal of the Acoustical Society of America, vol.116, no.2, pp.1224 1233, 2004. [29] P.F. Assmann and W.F. Katz, Synthesis fidelity and timevarying spectral change in vowels, The Journal of the Acoustical Society of America, vol.117, no.2, pp.886 895, 2005. 25

[30] H. Kawahara, M. Morise, T. Takahashi, R. Nisimura, T. Irino, and H. Banno, TANDEM-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0 and aperiodicity estimation, ICASSP 2008, pp.3933 3936, Las Vegas, 2008. [31] H. Kawahara and H. Matsui, Auditory morphing based on an elastic perceptual distance metric in an interferencefree time-frequency representation, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol.1, pp.256 259, 2003. [32] H. Kawahara, R. Nisimura, T. Irino, M. Morise, T. Takahashi, and H. Banno, Temporally variable multi-aspect auditory morphing enabling extrapolation without objective and perceptual breakdown, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp.3905 3908, 2009. [33] H. Kawahara, M. Morise, Banno, and V.G. Skuk, Temporally variable multi-aspect N-way morphing based on interference-free speech representations, ASPIPA ASC 2013, p.0s28.02, 2013. [34] L. Bruckert, P. Bestelmeyer, M. Latinus, J. Rouger, I. Charest, G.A. Rousselet, H. Kawahara, and P. Belin, Vocal Attractiveness Increases by Averaging, Current Biology, vol.20, no.2, pp.116 120, 2010. [35] S.R. Schweinberger, C. Casper, N. Hauthal, J.M. Kaufmann, H. Kawahara, N. Kloth, D.M.C. Robertson, A.P. Simpson, and R. Zäske, Auditory Adaptation in Voice Perception, Current Biology, vol.18, pp.684 688, 2008. [36] M. Unser, Sampling-50 years after Shannon, Proceedings of the IEEE, vol.88, no.4, pp.569 587, apr 2000. [37] H. Zen, K. Tokuda, and A.W. Black, Statistical parametric speech synthesis, Speech Communication, vol.51, no.11, pp.1039 1064, nov 2009. [38] H. Zen, T. Toda, M. Nakamura, and K. Tokuda, Details of the nitech hmm-based speech synthesis system for the blizzard challenge 2005, IEICE transactions on information and systems, vol.90, no.1, pp.325 333, 2007. [39] M. Morise, F. Yokomori, and K. Ozawa, World: A vocoderbased high-quality speech synthesis system for real-time applications, IEICE TRANSACTIONS on Information and Systems, vol.99, no.7, pp.1877 1884, 2016. [40] S. Imai, Cepstral analysis synthesis on the mel frequency scale, ICASSP 83. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.8, pp.93 96, Institute of Electrical and Electronics Engineers, Boston, apr 1983. [41] ITU-T, G.711 : Pulse code modulation (PCM) of voice frequencies, 1988. [42] Y. Saito, S. Takamichi, and H. Saruwatari, Training algorithm to deceive anti-spoofing verification for dnn-based speech synthesis, Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference onieee, pp.4900 4904 2017. [43] T. Kaneko, H. Kameoka, N. Hojo, Y. Ijima, K. Hiramatsu, and K. Kashino, Generative adversarial networkbased postfilter for statistical parametric speech synthesis, Proc. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2017), pp.4910-4914, 2017. [44] H. Kawahara, Y. Agiomyrgiannakis, and H. Zen, Using instantaneous frequency and aperiodicity detection to estimate F0 for high-quality speech synthesis, arxiv preprint arxiv:1605.07809, pp.1 10, 2016. http://arxiv.org/abs/1605.07809 [45] H. Kawahara, Y. Agiomyrgiannakis, and H. Zen, YANG VOCODER: Yet-ANother-Generalized VOCODER. ( 2017-06-13). https://github.com/google/yang_vocoder [46] H. Kawahara, K. Sakakibara, H. Banno, M. Morise, T. Toda, and T. Irino, A new cosine series antialiasing function and its application to aliasing-free glottal source models for speech and singing synthesis, Proc. Interspeech 2017, p., 2017. (Accepted: Extended draft: arxiv preprint arxiv:1702.06724). [47] H. Kawahara, K. Sakakibara, H. Banno, M. Morise, and T. Toda, A modulation property of time-frequency derivatives of filtered phase and its application to aperiodicity and fo estimation, Proc. Interspeech 2017, p., 2017. (Accepted: Extended draft: arxiv preprint arxiv:1706.02964). [48] vol.18 no.3 pp.43 52 2014 [49] H. Kawahara, MATLAB realtime speech tools and voice production tools, ( 20-Feb.-2017). http://www.wakayamau.ac.jp/%7ekawahara/sparkng/ [50],, H-87-21 1987 ( NTT (1989) ) [51] R. Plomp and H. Steeneken, Effect of phase on the timbre of complex tones, The Journal of the Acoustical Society of America, vol.46, no.2b, pp.409 421, 1969. [52] R.D. Patterson, A pulse ribbon model of monaural phase perception, The Journal of the Acoustical Society of America, vol.82, no.5, pp.1560 1586, 1987. [53] M. Schroeder, Synthesis of low-peak-factor signals and binary sequences with low autocorrelation (corresp.), IEEE Transactions on Information Theory, vol.16, no.1, pp.85 89, Jan. 1970. [54] J. Skoglund and W.B. Kleijn, On time-frequency masking in voiced speech, Speech and Audio Processing, IEEE Transactions on, vol.8, no.4, pp.361 369, jul 2000. [55] B.C.J. Moore, An introduction to the psychology of hearing: sixth edition, Emerald, 2012. [56] D.D. Greenwood, A cochlear frequency-position function for several species 29 years later, The Journal of the Acoustical Society of America vol.87 no.6 pp.2592 2605 1990 [57] E. Zwicker and E. Terhardt, Analytical expressions for critical-band rate and critical bandwidth as a function of frequency, The Journal of the Acoustical Society of America, vol.68, no.5, pp.1523 1525, 1980. [58] G. Fant, J. Liljencrants, and Q.-g. Lin, A four-parameter model of glottal flow, STL-QPSR, vol.4, no.1985, pp.1 13, 1985. [59] D.G. Childers and C. Ahn, Modeling the glottal volume velocity waveform for three voice types, The Journal of the Acoustical Society of America vol.97 no.1 pp.505 519 1995 [60] H. Kawahara, K.-I. Sakakibara, H. Banno, M. Morise, T. Toda, and T. Irino, Aliasing-free implementation of discrete-time glottal source models and their applications to speech synthesis and F0 extractor evaluation, 2015 Asia- Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp.520 529, IEEE, Hong Kong, dec 2015. [61] H. Fujisaki and M. Ljungqvist, Proposal and evaluation of models for the glottal source waveform, ICASSP 1986, pp.1605 1608, Tokyo, 1986. [62] P.H. Milenkovic, Voice source model for continuous control of pitch period, The Journal of the Acoustical Society of America, vol.93, no.2, pp.1087 1096, 1993. [63] I.R. Titze, Nonlinear source filter coupling in phonation: Theory, The Journal of the Acoustical Society of America, vol.123, no.5, pp.2733 2749, may 2008. 26