A study of geometric dependency of cepstrum on vocal tract length

Σχετικά έγγραφα
Buried Markov Model Pairwise

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

Resurvey of Possible Seismic Fissures in the Old-Edo River in Tokyo

Reminders: linear functions

Areas and Lengths in Polar Coordinates

Areas and Lengths in Polar Coordinates

Appendix to On the stability of a compressible axisymmetric rotating flow in a pipe. By Z. Rusak & J. H. Lee

Approximation of distance between locations on earth given by latitude and longitude

Congruence Classes of Invertible Matrices of Order 3 over F 2

EE512: Error Control Coding

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

Homework 8 Model Solution Section

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

EM Baum-Welch. Step by Step the Baum-Welch Algorithm and its Application 2. HMM Baum-Welch. Baum-Welch. Baum-Welch Baum-Welch.

CSJ. Speaker clustering based on non-negative matrix factorization using i-vector-based speaker similarity

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

Numerical Analysis FMN011

Estimation, Evaluation and Guarantee of the Reverberant Speech Recognition Performance based on Room Acoustic Parameters

CRASH COURSE IN PRECALCULUS

Reaction of a Platinum Electrode for the Measurement of Redox Potential of Paddy Soil

Matrices and Determinants

Second Order Partial Differential Equations

Vol.4-DCC-8 No.8 Vol.4-MUS-5 No.8 4// 3 3 Hanning (T ) 3 Hanning 3T (y(t)w(t)) dt =.5 T y (t)dt. () STRAIGHT F 3 TANDEM-STRAIGHT[] 3 F F 3 [] F []. :

1, +,*+* + +-,, -*, * : Key words: global warming, snowfall, snowmelt, snow water equivalent. Ohmura,,**0,**

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

Retrieval of Seismic Data Recorded on Open-reel-type Magnetic Tapes (MT) by Using Existing Devices

2 Composition. Invertible Mappings

Yoshifumi Moriyama 1,a) Ichiro Iimura 2,b) Tomotsugu Ohno 1,c) Shigeru Nakayama 3,d)

Problem Set 9 Solutions. θ + 1. θ 2 + cotθ ( ) sinθ e iφ is an eigenfunction of the ˆ L 2 operator. / θ 2. φ 2. sin 2 θ φ 2. ( ) = e iφ. = e iφ cosθ.

w o = R 1 p. (1) R = p =. = 1

Detection and Recognition of Traffic Signal Using Machine Learning

Analysis of prosodic features in native and non-native Japanese using generation process model of fundamental frequency contours

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

Development of a Seismic Data Analysis System for a Short-term Training for Researchers from Developing Countries

derivation of the Laplacian from rectangular to spherical coordinates

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

Other Test Constructions: Likelihood Ratio & Bayes Tests

CorV CVAC. CorV TU317. 1

Conductivity Logging for Thermal Spring Well

Mean bond enthalpy Standard enthalpy of formation Bond N H N N N N H O O O

1,a) 1,b) 2 3 Sakriani Sakti 1 Graham Neubig 1 1. A Study on HMM-Based Speech Synthesis Using Rich Context Models

Second Order RLC Filters

ΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ. ΘΕΜΑ: «ιερεύνηση της σχέσης µεταξύ φωνηµικής επίγνωσης και ορθογραφικής δεξιότητας σε παιδιά προσχολικής ηλικίας»

CE 530 Molecular Simulation

Ψηφιακή Επεξεργασία Φωνής

Maxima SCORM. Algebraic Manipulations and Visualizing Graphs in SCORM contents by Maxima and Mashup Approach. Jia Yunpeng, 1 Takayuki Nagai, 2, 1

The Simply Typed Lambda Calculus

k A = [k, k]( )[a 1, a 2 ] = [ka 1,ka 2 ] 4For the division of two intervals of confidence in R +

DESIGN OF MACHINERY SOLUTION MANUAL h in h 4 0.

Ανάκτηση Εικόνας βάσει Υφής με χρήση Eye Tracker

Math 6 SL Probability Distributions Practice Test Mark Scheme

D Alembert s Solution to the Wave Equation

n 1 n 3 choice node (shelf) choice node (rough group) choice node (representative candidate)

ES440/ES911: CFD. Chapter 5. Solution of Linear Equation Systems

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

Study on Re-adhesion control by monitoring excessive angular momentum in electric railway traction

VBA Microsoft Excel. J. Comput. Chem. Jpn., Vol. 5, No. 1, pp (2006)

[1] P Q. Fig. 3.1

Section 7.6 Double and Half Angle Formulas

Sampling Basics (1B) Young Won Lim 9/21/13

Partial Differential Equations in Biology The boundary element method. March 26, 2013

1. Ηλεκτρικό μαύρο κουτί: Αισθητήρας μετατόπισης με βάση τη χωρητικότητα

Study of In-vehicle Sound Field Creation by Simultaneous Equation Method

Strain gauge and rosettes

Section 8.3 Trigonometric Equations

C F E E E F FF E F B F F A EA C AEC

Simplex Crossover for Real-coded Genetic Algolithms

Written Examination. Antennas and Propagation (AA ) April 26, 2017.

GPU. CUDA GPU GeForce GTX 580 GPU 2.67GHz Intel Core 2 Duo CPU E7300 CUDA. Parallelizing the Number Partitioning Problem for GPUs

Mean-Variance Analysis

2 ~ 8 Hz Hz. Blondet 1 Trombetti 2-4 Symans 5. = - M p. M p. s 2 x p. s 2 x t x t. + C p. sx p. + K p. x p. C p. s 2. x tp x t.

Διπλωματική Εργασία του φοιτητή του Τμήματος Ηλεκτρολόγων Μηχανικών και Τεχνολογίας Υπολογιστών της Πολυτεχνικής Σχολής του Πανεπιστημίου Πατρών

Speech Recognition using Phase Information based on Long-Term Analysis

Homomorphism of Intuitionistic Fuzzy Groups

Evolution of Novel Studies on Thermofluid Dynamics with Combustion

6.3 Forecasting ARMA processes

E#ects of Drying on Bacterial Activity and Iron Formation in Acid Sulfate Soils

GPGPU. Grover. On Large Scale Simulation of Grover s Algorithm by Using GPGPU

Exercises 10. Find a fundamental matrix of the given system of equations. Also find the fundamental matrix Φ(t) satisfying Φ(0) = I. 1.

Journal of the Institute of Science and Engineering. Chuo University

Spherical Coordinates

Concrete Mathematics Exercises from 30 September 2016

þÿÿ ÁÌ» Â Ä Å ¹µÅ Å½Ä ÃÄ

DiracDelta. Notations. Primary definition. Specific values. General characteristics. Traditional name. Traditional notation

Inverse trigonometric functions & General Solution of Trigonometric Equations

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ.

Finite Field Problems: Solutions

Practice Exam 2. Conceptual Questions. 1. State a Basic identity and then verify it. (a) Identity: Solution: One identity is csc(θ) = 1

ΠΑΝΕΠΙΣΤΗΜΙΟ ΠΑΤΡΩΝ ΣΧΟΛΗ ΑΝΘΡΩΠΙΣΤΙΚΩΝ ΚΑΙ ΚΟΙΝΩΝΙΚΩΝ ΕΠΙΣΤΗΜΩΝ ΤΜΗΜΑ ΦΙΛΟΛΟΓΙΑΣ

Διερεύνηση ακουστικών ιδιοτήτων Νεκρομαντείου Αχέροντα

Τομέας: Ανανεώσιμων Ενεργειακών Πόρων Εργαστήριο: Σχεδιομελέτης και κατεργασιών

Molecular evolutionary dynamics of respiratory syncytial virus group A in

the total number of electrons passing through the lamp.

Το κοινωνικό στίγμα της ψυχικής ασθένειας

Section 8.2 Graphs of Polar Equations

Nov Journal of Zhengzhou University Engineering Science Vol. 36 No FCM. A doi /j. issn

ECE 468: Digital Image Processing. Lecture 8

C.S. 430 Assignment 6, Sample Solutions

Transcript:

THE INSTITUTE OF ELECTRONICS, INFORMTION ND COMMUNICTION ENGINEERS TECHNICL REPORT OF IEICE 277 882 5 5 3 33 7 3 E-mail: {dsk saito,matsuura,asakawa,mine,hirose}gavotu-tokyoacjp n (VTLN) VTLN ĉ = c n 8 cm 2 cm,,,, study of geometric dependency of cepstrum on vocal tract length Daisuke SITO, Ryo MTSUUR, Satoshi SKW, Nobuaki MINEMTSU, and bstract Keikichi HIROSE Graduate School of Frontier Sciences, The University of Tokyo 5 5 Kashiwano-ha, Kashiwa-shi, Chiba 277 882, Japan Graduate School of Information Science and Technology, The University of Tokyo 7 3 Hongo, Bunkyo-ku, Tokyo 3 33, Japan E-mail: {dsk saito,matsuura,asakawa,mine,hirose}gavotu-tokyoacjp In this paper, we theoretically and experimentally prove that the direction of cepstrum vectors strongly depends on vocal tract length and that this dependency is represented as rotation in the n dimensional cepstrum space In speech recognition studies, vocal tract length normalization (VTLN) techniques are widely used to cancel age- and gender-differences In VTLN, a frequency warping is often carried out and it can be implemented as a linear transformation in a cepstrum space; ĉ = c However, the geometric properties of this transformation matrix have not been well discussed In this study, its properties are made clear using n dimensional geometry and it is shown that the matrix rotates any cepstrum vector similarly and apparently Experimental results using resynthesized speech demonstrate that cepstrum vectors extracted from a speaker of 8 [cm] in height and those from another speaker of 2 [cm] in height are reasonably orthogonal This result clarifies one of the reasons why children s speech is very difficult for conventional speech recognizers to deal with adequately Key words frequency warping, cepstrum, geometric property, rotation matrix, vocal tract length

(Speaker Independent: SI) [] (CMN) (VTLN) CMN [2]CMN VTLN [3] VTLN 2 2 ω, ˆω ( < = ω, ˆω < = π) z = e jω, ẑ = e j ˆω ẑ = z α αz () α α < α < α > α () 2 2 [4], [5] c, ĉ Fig Examples of frequency warping functions for different values of α ĉ = c (2) ĉ = (ĉ ĉ 2 ĉ 3 ĉ 4 ) t α 2 2α 2α 3 α+α 3 4α 2 +3α 4 = B C c = (c c 2 c 3 c 4 ) t Pitz (3) a ij α [6] jx j a ij = (j ) m m=max(,j i) (3) (m + i ) (m + i j) ( )(m+i j) α (2m+i j) (4) j = m 8 < jc m (j > = m) : (j < m) 3 3 2 (5) (3) 2 2 n 2 (2) c ĉ ĉ α 2 2α 2α 3 = α+α 3 4α 2 +3α 4 ĉ 2 (6) T T c c 2 (6) T =R + O (7) 2

Transformed by R Transformed by T Transformed by O 2 α = 2 T RO Fig 2 Effects of transformations of T, R, and O for α = 2 2α 2 2α( 2 R = α2 ) (8) 2α( 2 α2 ) 2α 2 α 2 α 3 O = (9) α 2α 2 +3α 4 R (+t) k + kt 2α 2 2α α 2 R 2α () α 2 2α 2 cos 2θ sin 2θ = (α = sin θ) () sin 2θ cos 2θ () R R 2θ O T α < O 3 2 2 T R T 2 α = 2 2 T RO O 2 T R O T O T 3 2 y = (T I)c = ĉ c (2) I 2 (T I) y 2 T 3 (2) T 3 (2) Fig 3 Vector filed given by Equation (2) for α = 2 3 2 n n 2 n (3) R R R t R = RR t = I (3) det R = + (4) (3) (4) (3) α 2 2α α 3α n = B 2α 4α C n a ij 8 (i = j) >< a ij = sgn(j i) jα ( i j = ) >: (otherwise) (5) (6) sgn(j i) j i > + j i < t n n n t n +α 2 α 3α 2 α +8α 2 α 8α 2 t n n = 3α 2 α +8α 2 α B 8α 2 α + 32α 2 C (7) (7) k R +kα 2 i j = α i j = 2 m R mα 2 n t n 3

Table coustic conditions 6 khz / 6 bit Hamming window 25 ms 5 ms MFCC ( 2) [ a ] [ i ] [ a ] [ i ] Fig 4 4 Rotation of two cepstrum vectors and their vector +4α 2 α 4α 2 α +α 2 α 9α 2 n t n = 4α 2 α +2α 2 α (8) B 9α 2 α + 34α 2 C (7)(8) α α 2 i j = α t n n n t n (3) n [7]n det n = a nn det n a n(n ) a (n )n det n 2 (9) (5) a nn = a n(n ), a (n )n α 2 α det n det n det n (4) (3) n 2 3 n 3 T α 4 t t + c t, c t+ c t, c t+ c = c t+ c t Fig 5 5 Spectrogram of the original speech (left) and its warped version (right) 4 4 5 /aiueo/ STRIGHT [8] MFCC MFCC a, b θ (2) θ = arccos a b a b (2) a b a, b () (3) 8 < ˆω = ω ( < m = ω < m π) +m : m(ω π) + π ( m π < +m = ω < = π) (2) (2) m 4

Vocal tract length ration [m] 4 35 3 25 2 5 5-8 -6-4 -2 2 4 6 8 Warping parameter [α] 6 α m Fig 6 Relation between warping parameter and vocal tract length ratio (2) 5 5 (2) m α α (2) 2 m α m 6 6 8 cm α = 4 9 cm α = 4 36 cm 4 2 7 /a/ /i//i/ /u//u/ /e//e/ /o/ (2) (a) (c) (d) (f) 7 MFCC MFCC MFCC 7 MFCC MFCC MFCC 5 7(b) 8 cm 2 cm MFCC α α 3 2 α α 8 cm 2 cm [9] [] [], [2] 6 [9] 5

5 5-5 - original height -5 5 5 2 25 3 35 5 5-5 (a):mfcc (male) - original height -5 5 5 2 25 3 35 (d):mfcc (female) 5 5-5 - original height -5 5 5 2 25 3 35 5 5-5 (b): MFCC (male) - original height -5 5 5 2 25 3 35 (e): MFCC (female) 5 5-5 - original height -5 5 5 2 25 3 35 5 5-5 (c): MFCC (male) - original height -5 5 5 2 25 3 35 (f): MFCC (female) 7 : (a) (c): 8 cm ;(d) (f): 63 cm Fig 7 Relation between the rotation angle and the estimated body height (a) to (c) are from a male speaker of 8 cm in height and (d) to (f) are from a female speaker of 63 cm in height [3] [] M Russel and S D rcy: Challenges for computer recognition of children s speech, CD-ROM of SLaTE27, 27 [2] B tal: Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification, JcoustSocmerica, vol 55, pp 34 32, 974 [3] E Eide and H Gish: parametric approach to vocal tract length normalization, ICSSP96, vol, pp 346 348, 996 [4], :, D-II, vol J83-D-II, no, pp28 27, 2 [5] T Emori and K Shinoda: Rapid Vocal Tract Length Normalization usgin Maximum Likelihood Estimation, Eurospeech2, pp 649 652, 2 [6] M Pitz and H Ney: Vocal tract normalization equals linear transformation in cepstral space, IEEE Trans Speech and udio Processing, vol 3, pp 93 944, 25 [7] R Horn and CR Johnson: Matrix nalysis, Cambridge University Press, 985 [8] H Kawahara et al: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F extraction: Possible role of a repetitive structure in sounds, Speech Communication,, vol 27, pp 87 27, 999 [9],, :,, pp 43 5, 24 [] N Minematsu: Mathematical evidence of the acoustic universal structure in speech, ICSSP25, pp889 892, 25 [] S sakawa et al : utomatic Recognition of Connected Vowels Only Using Speaker-invariant Representation of Speech Dynamics, INTERSPEECH27, pp89 893, 27 [2] :,, 3-Q-, 27 [3] :, Visual Computing / CD, 27 6