VOCODER VOCODER Vocal

Σχετικά έγγραφα
Vol.4-DCC-8 No.8 Vol.4-MUS-5 No.8 4// 3 3 Hanning (T ) 3 Hanning 3T (y(t)w(t)) dt =.5 T y (t)dt. () STRAIGHT F 3 TANDEM-STRAIGHT[] 3 F F 3 [] F []. :

SNR F0 [2], [3], [4] F0 F0 F0 F0 F0 TUSK F0 TUSK F0 6 TUSK 6 F0 2. F0 F0 [5] [6] [7] p[8] Cepstrum [9], [10] [11] [12] [13] F0 [14] F0 [15] DIO[16] [1

: TANDEM-STRAIGHT. Make singing voice tangible: TANDEM-STRAIGHT and temporally variable morphing as substrate. Hideki Kawahara 1 and Masanori Morise 2

Signal processing for handling singing voice texture

Fourier transform, STFT 5. Continuous wavelet transform, CWT STFT STFT STFT STFT [1] CWT CWT CWT STFT [2 5] CWT STFT STFT CWT CWT. Griffin [8] CWT CWT

LSP (Line Spectrum Pair, LSF: Line Spectrum Frequencies) [19] [20] LSP CODE [21], [22] VOCODER [23] 2. 3 STRAIGHT VOCODER STRAIGHT [24] STRAIGHT [25]

MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)

Ψηφιακή Επεξεργασία Φωνής

6.003: Signals and Systems. Modulation

ΜΕΛΕΤΗ ΚΑΙ ΠΡΟΣΟΜΟΙΩΣΗ ΙΑΜΟΡΦΩΣΕΩΝ ΣΕ ΨΗΦΙΑΚΑ ΣΥΣΤΗΜΑΤΑ ΕΠΙΚΟΙΝΩΝΙΩΝ.

1,a) 1,b) 2 3 Sakriani Sakti 1 Graham Neubig 1 1. A Study on HMM-Based Speech Synthesis Using Rich Context Models

Detection and Recognition of Traffic Signal Using Machine Learning

GUI

Main source: "Discrete-time systems and computer control" by Α. ΣΚΟΔΡΑΣ ΨΗΦΙΑΚΟΣ ΕΛΕΓΧΟΣ ΔΙΑΛΕΞΗ 4 ΔΙΑΦΑΝΕΙΑ 1

ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ. του Φοιτητή του Τμήματος Ηλεκτρολόγων Μηχανικών και Τεχνολογίας Υπολογιστών της Πολυτεχνικής Σχολής του Πανεπιστημίου Πατρών

Acoustic Signal Adjustment by Considering Musical Expressive Intention Using a Performance Intension Function

2 ~ 8 Hz Hz. Blondet 1 Trombetti 2-4 Symans 5. = - M p. M p. s 2 x p. s 2 x t x t. + C p. sx p. + K p. x p. C p. s 2. x tp x t.

ΑΝΙΧΝΕΥΣΗ ΓΕΓΟΝΟΤΩΝ ΒΗΜΑΤΙΣΜΟΥ ΜΕ ΧΡΗΣΗ ΕΠΙΤΑΧΥΝΣΙΟΜΕΤΡΩΝ ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ

NMBTC.COM /

Stabilization of stock price prediction by cross entropy optimization

Buried Markov Model Pairwise

Higher-Order Correlation Analysis of Pitch Fluctuations in Sustained Normal Vowels by the Method of Surrogate Data

Study of In-vehicle Sound Field Creation by Simultaneous Equation Method

ITU-R BT ITU-R BT ( ) ITU-T J.61 (

ER-Tree (Extended R*-Tree)

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

Second Order RLC Filters

Retrieval of Seismic Data Recorded on Open-reel-type Magnetic Tapes (MT) by Using Existing Devices

Sampling Basics (1B) Young Won Lim 9/21/13

Simplex Crossover for Real-coded Genetic Algolithms

Μειέηε, θαηαζθεπή θαη πξνζνκνίσζε ηεο ιεηηνπξγίαο κηθξήο αλεκνγελλήηξηαο αμνληθήο ξνήο ΓΗΠΛΩΜΑΣΗΚΖ ΔΡΓΑΗΑ

Feasible Regions Defined by Stability Constraints Based on the Argument Principle

Ψηφιακή Επεξεργασία Φωνής

[1] P Q. Fig. 3.1

Monolithic Crystal Filters (M.C.F.)

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

Study on Re-adhesion control by monitoring excessive angular momentum in electric railway traction

v.connect 2 v.connect : A Singing Synthesis System Enabling Users to Control Vocal Tones Makoto Ogawa, 1 Syunji Yazaki 1 and Kôki Abe 1 VOCALOID

Assalamu `alaikum wr. wb.

Design and Fabrication of Water Heater with Electromagnetic Induction Heating

Research on mode-locked optical fiber laser

Επιβλέπουσα Καθηγήτρια: ΣΟΦΙΑ ΑΡΑΒΟΥ ΠΑΠΑΔΑΤΟΥ

RF series Ultra High Q & Low ESR capacitor series

Q L -BFGS. Method of Q through full waveform inversion based on L -BFGS algorithm. SUN Hui-qiu HAN Li-guo XU Yang-yang GAO Han ZHOU Yan ZHANG Pan

F0 Estimation of Melody and Bass Lines in Real-world Musical Audio Signals

Περιεχόµενα. ΕΠΛ 422: Συστήµατα Πολυµέσων. Μέθοδοι συµπίεσης ηχητικών. Βιβλιογραφία. Κωδικοποίηση µε βάση την αντίληψη.

ISM 868 MHz Ceramic Antenna Ground cleared under antenna, clearance area mm x 8.25 mm. Pulse Part Number: W3013

ΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ. ΘΕΜΑ: «ιερεύνηση της σχέσης µεταξύ φωνηµικής επίγνωσης και ορθογραφικής δεξιότητας σε παιδιά προσχολικής ηλικίας»

n 1 n 3 choice node (shelf) choice node (rough group) choice node (representative candidate)

A study of geometric dependency of cepstrum on vocal tract length

CHAPTER 48 APPLICATIONS OF MATRICES AND DETERMINANTS

Probability and Random Processes (Part II)

A Method of Trajectory Tracking Control for Nonminimum Phase Continuous Time Systems

Outline Analog Communications. Lecture 05 Angle Modulation. Instantaneous Frequency and Frequency Deviation. Angle Modulation. Pierluigi SALVO ROSSI

ECE 468: Digital Image Processing. Lecture 8

Ψηφιακή Επεξεργασία Φωνής


Fundamentals of Signals, Systems and Filtering

ECE 308 SIGNALS AND SYSTEMS FALL 2017 Answers to selected problems on prior years examinations

Transient Voltage Suppression Diodes: 1.5KE Series Axial Leaded Type 1500 W

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

Assignment 1 Solutions Complex Sinusoids

Multilayer Ceramic Chip Capacitors

INPAQ Global RF/Component Solutions

Multilayer Ceramic Chip Capacitors

MCB and MHC Series Chip Ferrite Bead for Automotive Applications Qualified based on AEC-Q200

Development of a basic motion analysis system using a sensor KINECT

High order interpolation function for surface contact problem

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

(1) Describe the process by which mercury atoms become excited in a fluorescent tube (3)

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

Resurvey of Possible Seismic Fissures in the Old-Edo River in Tokyo

ISM 900 MHz Ceramic Antenna Ground cleared under antenna, clearance area mm x 8.25 mm. Pulse Part Number: W3012

MnZn. MnZn Ferrites with Low Loss and High Flux Density for Power Supply Transformer. Abstract:

CSJ. Speaker clustering based on non-negative matrix factorization using i-vector-based speaker similarity


Research on Economics and Management

WIRE WOUND CHIP INDUCTORS

Maxima SCORM. Algebraic Manipulations and Visualizing Graphs in SCORM contents by Maxima and Mashup Approach. Jia Yunpeng, 1 Takayuki Nagai, 2, 1

FP series Anti-Bend (Soft termination) capacitor series

VBA Microsoft Excel. J. Comput. Chem. Jpn., Vol. 5, No. 1, pp (2006)

Σύστημα ψηφιακής επεξεργασίας ακουστικών σημάτων με χρήση προγραμματιζόμενων διατάξεων πυλών. Πτυχιακή Εργασία. Φοιτητής: ΤΣΟΥΛΑΣ ΧΡΗΣΤΟΣ

Tunable Diode Lasers. Turning Laser Diodes into Diode Lasers. Mode selection. Laser diodes

ST5224: Advanced Statistical Theory II

Transient Voltage Suppressor

Numerical Analysis FMN011

1.575 GHz GPS Ceramic Chip Antenna Ground cleared under antenna, clearance area 4.00 x 4.25 mm / 6.25 mm. Pulse Part Number: W3011 / W3011A

Fourier Series. MATH 211, Calculus II. J. Robert Buchanan. Spring Department of Mathematics

Graded Refractive-Index

Appendix to On the stability of a compressible axisymmetric rotating flow in a pipe. By Z. Rusak & J. H. Lee

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

Fundamentals of Multiplexing and Multiple Access Technologies

SMD Transient Voltage Suppressors

CT Correlation (2B) Young Won Lim 8/15/14

Motion analysis and simulation of a stratospheric airship


DERIVATION OF MILES EQUATION FOR AN APPLIED FORCE Revision C

Διπλωματική Εργασία του φοιτητή του Τμήματος Ηλεκτρολόγων Μηχανικών και Τεχνολογίας Υπολογιστών της Πολυτεχνικής Σχολής του Πανεπιστημίου Πατρών

Κάθε γνήσιο αντίγραφο φέρει υπογραφή του συγγραφέα. / Each genuine copy is signed by the author.

BCI On Feature Extraction from Multi-Channel Brain Waves Used for Brain Computer Interface

Transcript:

Vol.1-MUS-95 No.3 1/6/ VOCODER 1,a) 1,b) 1,c) 1,d) VOCODER VOCODER Vocal VOCODER Cross synthesis VOCODER which preserves linguistic information and characteristic timbre of musical instruments and animal voices Nishi Taiki 1,a) Nisimura Ryuichi 1,b) Irino Toshio 1,c) Kawahara Hideki 1,d) Abstract: A new design method of cross synthesis VOCODER, which synthesizes sounds by mixing features of two input sounds, such as speech and musical instruments or animal voices, is proposed. Cross synthesis VOCODER is originated from a narrow-band transmission technology and currently widely used as an effector for musical performance and production. However, current cross synthesis effects tend to deteriorate original character of musical instruments and linguistic information of the processed sound is not always intelligible. The proposed method provide ways to alleviate these difficulties using two technique. One is spectral global shape removal form the speech spectral envelope and the other is band-pass filtering in the modulation frequency domain. Subjective test results indicated relevance of the proposed techniques and provide design guideline of new flexible cross synthesis VOCODERs. Keywords: Cross Synthesis VOCODER, musical instruments, animal voice, linguistic information, modulation frequency domain 1. VOCODER[1] VOCODER 1 Wakayama University Sakaedani 93, Wakayama 64-851, Japan a) s1538@center.wakayama-u.ac.jp b) nisimura@sys.wakayama-u.ac.jp c) irino@sys.wakayama-u.ac.jp d) kawahara@sys.wakayama-u.ac.jp cross synthesis Channel VOCODER Phase VOCODER[] LPC[3], [4] [5] c 1 Information Processing Society of Japan 1

VOCODER STRAIGHT[6] TANDEM-STRAIGHT[7] VOCODER. TANDEM-STRAIGHT Fourier TANDEM-STRAIGHT[7].1 TANDEM T P (ω, t) P T (ω, t) TANDEM [8] P T (ω, t) = P ( ω, t T 4. STRAIGHT ) + P ( ω, t + T 4 ) (1) TANDEM consistent sampling cepstrum lifter [9] P T ST (ω) = exp(f 1 [g 1 (τ)g (τ)c T (τ)]) () where g 1 (τ) = q + q 1 cos( πτ ) (3) T g (τ) = sin(πf τ) πf τ C T (τ) = { 1 ω h (ω) = ω ω otherwise = F[h (ω)] (4) (5) ln(p T (ω, t))e jωτ dω (6) g 1 (τ) g (τ) τ quefrency F Fourier ω = πf q q 1 [1] 3. 3.1 P (ω, t) (, T s ) P t (ω) P t (ω) = 1 Ts P (ω, t)dt (7) T s Vol.1-MUS-95 No.3 1/6/ c 1 Information Processing Society of Japan

Vol.1-MUS-95 No.3 1/6/ g(x) P t,g (ω) ( ) P t,g (ω) = g 1 1 Ts g(p (ω, t))dt T s (8) 1 8 6 4 1 3 4 5 g(x) P t,g (ω) P t (ω) g(x) Weber-Fechner g(x) ( 1 P t,ln (ω) = exp T s Ts 3. ln(p (ω, t))dt ) (9) ERB N number[11] S(ω) λ(ω) ERB N number ω(λ) ERB N number ERB N number a S a (ω) S a (ω) = 1 C ω(λ(ω)+a/) ω(λ(ω) a/) S(q)dq (1) C = ω(λ(ω) + a/) ω(λ(ω) a/) S(ω) P t,ln (ω) P a,ln (ω)..4.6.8 1 1. 1.4 1.6 time (s) 1 1 8 6 4 Fig. 1 spectrogram /konnichiwa/..4.6.8 1 1. 1.4 1.6 time (s) Fig. normalize spectrogram / konnichiwa/ 4. [1][13] VOCODER 6 7 8 4 3 1 1 3 4 3.3 STRAIGHT P ST (ω, t) D(ω, t) D a,ln (ω, t) D a,ln (ω, t) = P ST (ω, t) P a,ln (ω) (11) 1 TANDEM-STRAIGHT 4.1 [14] FIR 1 -.9 Hanning Blackmann -db 1 16Hz c 1 Information Processing Society of Japan 3

Vol.1-MUS-95 No.3 1/6/ 1 3 4 8 gain (db) 6 8 1 1 frequency(hz) 6 4 1 1 14 16 18 1 1 1 1 3 Fig. 3 Modulation transfer function of the high-pass filter..4.6.8 1 1. 1.4 1.6 time(s) 5 Fig. 5 spectrogram /konnichiwa/ by filtering modulation spectrum 3 4 1 5 gain (db) 6 8 1 1 14 16 18 1 1 1 1 4 Fig. 4 Modulation transfer function of the low-pass filter 4Hz 1 Hz 4 8Hz 3 4 3dB 1-1dB -11dB [14] 3 5 6 5. FFT frequency(hz) 15 8 1 5 6 5 4 1 15 5..4.6.8 1 1. 1.4 1.6 time(s) 6 Fig. 6 spectrogram / konnichiwa/ by filtering modulation spectrum cepstrum [15] 1 ms Hanning 5 ms OLA(overlap and add) [16] 6. VOCODER 6.1 1 ohayogozaimasu RWC c 1 Information Processing Society of Japan 4

1 Table 1 ( ) ( ) (row)high-pass cut-off filter (column)low-pass cut-off filter (Hz) 1 a d g 4 b e h 8 c f i [17] C3 E3 G3 No.1 PF No.13 EG C3.. 15. 17. 44.1kHz 6. TANDEM-STRAIGHT 9 a i ( 1) 1Hz Hz Hz ( 3) Hz 4Hz 8Hz ( 4) 6.3 1 4 6 3 9 6.4 (YAMAHA AVITECS) MacBook Pro D/A converter(edirol UA-11) SENNHEISER HD-58 16bit -6dB A HATS(B&K418) 6 7dB S1 S S1 S S1 S S1 S c f i be 1..8.4.4.8 1. 1. 7 Fig. 7 result about preserves characteristic timbre of musical instruments and animal voices g d a h e b i f c Fig. 8.8 8.4 6.5 h a.4 d g.8 result about preserves linguistic information of musical instruments and animal voices 7 8 9 1 a i 11 a i 7. Vol.1-MUS-95 No.3 1/6/ VOCODER VOCODER 1. c 1 Information Processing Society of Japan 5

Vol.1-MUS-95 No.3 1/6/ 9 Fig. 9 result about preserves characteristic timbre of Instruments and animal voice 1 Fig. 1 result about preserves linguistic information 11 Fig. 11 preserves linguistic information and characteristic timbre of musical instruments and animal voices 3 [1] H. Dudley: Remaking speech, J.Acoust.Soc.Am., vol.11, no., pp.169-177, 1939. [] J. L. Flanagan: Phase vocoder, the Bell System Technical Journal, pp.1493-159, 1966. [3], :,. A 53(1), pp.35-4, 197. [4] B. S. Atal, S. L. Hanauer: Speech Analysis and Synthesis by Linear Prediction of the Speech Wave, J. Acoust. Soc. Am., vol.5, B, pp.637-655, 1971. [5] C. Roads: The Computer Music Tutorial, The MIT Press,. [6] H. Kawahara I. Masuda, and A. decheveigné: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneousfrequency-based F extraction, Speech Communication, vol. 7, no. 3-4, pp.187-7, 1999. [7] H. Kawahara, M. Morise, T. Takahashi, R. Nisimura, T. Irino, H. Banno, Tandem-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F, and aperiodicity estimation, Proc. ICASSP 8. Las Vegas., pp.3933-3936, 8. [8],,,., D, vol.j 9-D, No.1, pp.365-367, 7. [9] Hideki Kawahara and Masanori Morise, Technical foundations of TANDEM-STRAIGHT, a speech analysis, modification and synthesis framework, Sadhana, Vol. 36, Part 5, October 11, pp. 713-77 [1],,,. F. A, Vol. J94-A, No. 8, pp. 557-567, 11. [11] B.C.J. Moore 1994. [1] R. Drullman,J.M. Festen, and R.Plomp: Effect of temporal envelope smearing on speech reception, J.Acoust. Soc. Am.,vol.95, no., pp.153-164, Feb,1994. [13] R. Drullman,J.M. Festen, and R.Plomp:Effect of reducing slow temporal modulations on speech reception., J.Acoust.Soc.Am., vol.95, no.5, pp.67-68, May,1994. [14],,., ( D-II), Vol.J84-D-, No.7, pp.161-169, 11. [15] - -,, (1996). [16],,,,,.,, Vol.41, No.7, pp.561-566, 11. [17],,, : RWC :,, Vol.45, No.3, pp.78-738, March 4. c 1 Information Processing Society of Japan 6