VOCODER VOCODER Vocal

Vol.1-MUS-95 No.3 1/6/ VOCODER 1,a) 1,b) 1,c) 1,d) VOCODER VOCODER Vocal VOCODER Cross synthesis VOCODER which preserves linguistic information and characteristic timbre of musical instruments and animal voices Nishi Taiki 1,a) Nisimura Ryuichi 1,b) Irino Toshio 1,c) Kawahara Hideki 1,d) Abstract: A new design method of cross synthesis VOCODER, which synthesizes sounds by mixing features of two input sounds, such as speech and musical instruments or animal voices, is proposed. Cross synthesis VOCODER is originated from a narrow-band transmission technology and currently widely used as an effector for musical performance and production. However, current cross synthesis effects tend to deteriorate original character of musical instruments and linguistic information of the processed sound is not always intelligible. The proposed method provide ways to alleviate these difficulties using two technique. One is spectral global shape removal form the speech spectral envelope and the other is band-pass filtering in the modulation frequency domain. Subjective test results indicated relevance of the proposed techniques and provide design guideline of new flexible cross synthesis VOCODERs. Keywords: Cross Synthesis VOCODER, musical instruments, animal voice, linguistic information, modulation frequency domain 1. VOCODER[1] VOCODER 1 Wakayama University Sakaedani 93, Wakayama 64-851, Japan a) s1538@center.wakayama-u.ac.jp b) nisimura@sys.wakayama-u.ac.jp c) irino@sys.wakayama-u.ac.jp d) kawahara@sys.wakayama-u.ac.jp cross synthesis Channel VOCODER Phase VOCODER[] LPC[3], [4] [5] c 1 Information Processing Society of Japan 1

VOCODER STRAIGHT[6] TANDEM-STRAIGHT[7] VOCODER. TANDEM-STRAIGHT Fourier TANDEM-STRAIGHT[7].1 TANDEM T P (ω, t) P T (ω, t) TANDEM [8] P T (ω, t) = P ( ω, t T 4. STRAIGHT ) + P ( ω, t + T 4 ) (1) TANDEM consistent sampling cepstrum lifter [9] P T ST (ω) = exp(f 1 [g 1 (τ)g (τ)c T (τ)]) () where g 1 (τ) = q + q 1 cos( πτ ) (3) T g (τ) = sin(πf τ) πf τ C T (τ) = { 1 ω h (ω) = ω ω otherwise = F[h (ω)] (4) (5) ln(p T (ω, t))e jωτ dω (6) g 1 (τ) g (τ) τ quefrency F Fourier ω = πf q q 1 [1] 3. 3.1 P (ω, t) (, T s ) P t (ω) P t (ω) = 1 Ts P (ω, t)dt (7) T s Vol.1-MUS-95 No.3 1/6/ c 1 Information Processing Society of Japan

Vol.1-MUS-95 No.3 1/6/ g(x) P t,g (ω) ( ) P t,g (ω) = g 1 1 Ts g(p (ω, t))dt T s (8) 1 8 6 4 1 3 4 5 g(x) P t,g (ω) P t (ω) g(x) Weber-Fechner g(x) ( 1 P t,ln (ω) = exp T s Ts 3. ln(p (ω, t))dt ) (9) ERB N number[11] S(ω) λ(ω) ERB N number ω(λ) ERB N number ERB N number a S a (ω) S a (ω) = 1 C ω(λ(ω)+a/) ω(λ(ω) a/) S(q)dq (1) C = ω(λ(ω) + a/) ω(λ(ω) a/) S(ω) P t,ln (ω) P a,ln (ω)..4.6.8 1 1. 1.4 1.6 time (s) 1 1 8 6 4 Fig. 1 spectrogram /konnichiwa/..4.6.8 1 1. 1.4 1.6 time (s) Fig. normalize spectrogram / konnichiwa/ 4. [1][13] VOCODER 6 7 8 4 3 1 1 3 4 3.3 STRAIGHT P ST (ω, t) D(ω, t) D a,ln (ω, t) D a,ln (ω, t) = P ST (ω, t) P a,ln (ω) (11) 1 TANDEM-STRAIGHT 4.1 [14] FIR 1 -.9 Hanning Blackmann -db 1 16Hz c 1 Information Processing Society of Japan 3

Vol.1-MUS-95 No.3 1/6/ 1 3 4 8 gain (db) 6 8 1 1 frequency(hz) 6 4 1 1 14 16 18 1 1 1 1 3 Fig. 3 Modulation transfer function of the high-pass filter..4.6.8 1 1. 1.4 1.6 time(s) 5 Fig. 5 spectrogram /konnichiwa/ by filtering modulation spectrum 3 4 1 5 gain (db) 6 8 1 1 14 16 18 1 1 1 1 4 Fig. 4 Modulation transfer function of the low-pass filter 4Hz 1 Hz 4 8Hz 3 4 3dB 1-1dB -11dB [14] 3 5 6 5. FFT frequency(hz) 15 8 1 5 6 5 4 1 15 5..4.6.8 1 1. 1.4 1.6 time(s) 6 Fig. 6 spectrogram / konnichiwa/ by filtering modulation spectrum cepstrum [15] 1 ms Hanning 5 ms OLA(overlap and add) [16] 6. VOCODER 6.1 1 ohayogozaimasu RWC c 1 Information Processing Society of Japan 4

1 Table 1 ( ) ( ) (row)high-pass cut-off filter (column)low-pass cut-off filter (Hz) 1 a d g 4 b e h 8 c f i [17] C3 E3 G3 No.1 PF No.13 EG C3.. 15. 17. 44.1kHz 6. TANDEM-STRAIGHT 9 a i ( 1) 1Hz Hz Hz ( 3) Hz 4Hz 8Hz ( 4) 6.3 1 4 6 3 9 6.4 (YAMAHA AVITECS) MacBook Pro D/A converter(edirol UA-11) SENNHEISER HD-58 16bit -6dB A HATS(B&K418) 6 7dB S1 S S1 S S1 S S1 S c f i be 1..8.4.4.8 1. 1. 7 Fig. 7 result about preserves characteristic timbre of musical instruments and animal voices g d a h e b i f c Fig. 8.8 8.4 6.5 h a.4 d g.8 result about preserves linguistic information of musical instruments and animal voices 7 8 9 1 a i 11 a i 7. Vol.1-MUS-95 No.3 1/6/ VOCODER VOCODER 1. c 1 Information Processing Society of Japan 5

Vol.1-MUS-95 No.3 1/6/ 9 Fig. 9 result about preserves characteristic timbre of Instruments and animal voice 1 Fig. 1 result about preserves linguistic information 11 Fig. 11 preserves linguistic information and characteristic timbre of musical instruments and animal voices 3 [1] H. Dudley: Remaking speech, J.Acoust.Soc.Am., vol.11, no., pp.169-177, 1939. [] J. L. Flanagan: Phase vocoder, the Bell System Technical Journal, pp.1493-159, 1966. [3], :,. A 53(1), pp.35-4, 197. [4] B. S. Atal, S. L. Hanauer: Speech Analysis and Synthesis by Linear Prediction of the Speech Wave, J. Acoust. Soc. Am., vol.5, B, pp.637-655, 1971. [5] C. Roads: The Computer Music Tutorial, The MIT Press,. [6] H. Kawahara I. Masuda, and A. decheveigné: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneousfrequency-based F extraction, Speech Communication, vol. 7, no. 3-4, pp.187-7, 1999. [7] H. Kawahara, M. Morise, T. Takahashi, R. Nisimura, T. Irino, H. Banno, Tandem-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F, and aperiodicity estimation, Proc. ICASSP 8. Las Vegas., pp.3933-3936, 8. [8],,,., D, vol.j 9-D, No.1, pp.365-367, 7. [9] Hideki Kawahara and Masanori Morise, Technical foundations of TANDEM-STRAIGHT, a speech analysis, modification and synthesis framework, Sadhana, Vol. 36, Part 5, October 11, pp. 713-77 [1],,,. F. A, Vol. J94-A, No. 8, pp. 557-567, 11. [11] B.C.J. Moore 1994. [1] R. Drullman,J.M. Festen, and R.Plomp: Effect of temporal envelope smearing on speech reception, J.Acoust. Soc. Am.,vol.95, no., pp.153-164, Feb,1994. [13] R. Drullman,J.M. Festen, and R.Plomp:Effect of reducing slow temporal modulations on speech reception., J.Acoust.Soc.Am., vol.95, no.5, pp.67-68, May,1994. [14],,., ( D-II), Vol.J84-D-, No.7, pp.161-169, 11. [15] - -,, (1996). [16],,,,,.,, Vol.41, No.7, pp.561-566, 11. [17],,, : RWC :,, Vol.45, No.3, pp.78-738, March 4. c 1 Information Processing Society of Japan 6