Estimation, Evaluation and Guarantee of the Reverberant Speech Recognition Performance based on Room Acoustic Parameters

Σχετικά έγγραφα
Buried Markov Model Pairwise

Fourier transform, STFT 5. Continuous wavelet transform, CWT STFT STFT STFT STFT [1] CWT CWT CWT STFT [2 5] CWT STFT STFT CWT CWT. Griffin [8] CWT CWT

CSJ. Speaker clustering based on non-negative matrix factorization using i-vector-based speaker similarity

MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

Study on Re-adhesion control by monitoring excessive angular momentum in electric railway traction

[5] F 16.1% MFCC NMF D-CASE 17 [5] NMF NMF 3. [5] 1 NMF Deep Neural Network(DNN) FUSION 3.1 NMF NMF [12] S W H 1 Fig. 1 Our aoustic event detect

Διερεύνηση ακουστικών ιδιοτήτων Νεκρομαντείου Αχέροντα

1 (forward modeling) 2 (data-driven modeling) e- Quest EnergyPlus DeST 1.1. {X t } ARMA. S.Sp. Pappas [4]

n 1 n 3 choice node (shelf) choice node (rough group) choice node (representative candidate)

Speech Recognition using Phase Information based on Long-Term Analysis

Nov Journal of Zhengzhou University Engineering Science Vol. 36 No FCM. A doi /j. issn

Πτυχιακή Εργασι α «Εκτι μήσή τής ποιο τήτας εικο νων με τήν χρή σή τεχνήτων νευρωνικων δικτυ ων»

Stabilization of stock price prediction by cross entropy optimization

[2] REVERB 8 [3], [4] [5] [20] [6], [7], [8], [9], [10] [11] REVERB 8 *1 [9] LDA *2 MLLT (SAT) [8] (basis fmllr) [12] (DNN) [10] DNN [11] [13] [14] Ka

SNR F0 [2], [3], [4] F0 F0 F0 F0 F0 TUSK F0 TUSK F0 6 TUSK 6 F0 2. F0 F0 [5] [6] [7] p[8] Cepstrum [9], [10] [11] [12] [13] F0 [14] F0 [15] DIO[16] [1

Ψηφιακή Επεξεργασία Φωνής

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review


An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

CorV CVAC. CorV TU317. 1

Audio Engineering Society. Convention Paper. Presented at the 120th Convention 2006 May Paris, France

6.3 Forecasting ARMA processes

Feasible Regions Defined by Stability Constraints Based on the Argument Principle

Optimizing Microwave-assisted Extraction Process for Paprika Red Pigments Using Response Surface Methodology

Vol.4-DCC-8 No.8 Vol.4-MUS-5 No.8 4// 3 3 Hanning (T ) 3 Hanning 3T (y(t)w(t)) dt =.5 T y (t)dt. () STRAIGHT F 3 TANDEM-STRAIGHT[] 3 F F 3 [] F []. :

1,a) 1,b) 2 3 Sakriani Sakti 1 Graham Neubig 1 1. A Study on HMM-Based Speech Synthesis Using Rich Context Models

Ψηφιακή Επεξεργασία Φωνής

Schedulability Analysis Algorithm for Timing Constraint Workflow Models

ER-Tree (Extended R*-Tree)

2 ~ 8 Hz Hz. Blondet 1 Trombetti 2-4 Symans 5. = - M p. M p. s 2 x p. s 2 x t x t. + C p. sx p. + K p. x p. C p. s 2. x tp x t.

Proforma C. Flood-CBA#2 Training Seminars. Περίπτωση Μελέτης Ποταμός Έ βρος, Κοινότητα Λαβάρων

Study of In-vehicle Sound Field Creation by Simultaneous Equation Method

Chapter 1 Introduction to Observational Studies Part 2 Cross-Sectional Selection Bias Adjustment

Presentation Structure

ΓΕΩΠΟΝΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙO ΑΘΗΝΩΝ ΤΜΗΜΑ ΑΞΙΟΠΟΙΗΣΗΣ ΦΥΣΙΚΩΝ ΠΟΡΩΝ & ΓΕΩΡΓΙΚΗΣ ΜΗΧΑΝΙΚΗΣ

Improvement of wave height forecast in deep and intermediate waters with the use of stochastic methods

HIV HIV HIV HIV AIDS 3 :.1 /-,**1 +332

Εξάλειψη αντήχησης από ηχητικά σήματα με υποκειμενικά / ψυχοακουστικά κριτήρια

GPGPU. Grover. On Large Scale Simulation of Grover s Algorithm by Using GPGPU

A Method of Trajectory Tracking Control for Nonminimum Phase Continuous Time Systems

Metal Oxide Varistors (MOV) Data Sheet

Development of the Nursing Program for Rehabilitation of Woman Diagnosed with Breast Cancer

SocialDict. A reading support tool with prediction capability and its extension to readability measurement

Summary of the model specified

[4] 1.2 [5] Bayesian Approach min-max min-max [6] UCB(Upper Confidence Bound ) UCT [7] [1] ( ) Amazons[8] Lines of Action(LOA)[4] Winands [4] 1

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

A study of geometric dependency of cepstrum on vocal tract length

VBA Microsoft Excel. J. Comput. Chem. Jpn., Vol. 5, No. 1, pp (2006)

Simplex Crossover for Real-coded Genetic Algolithms

Development of a basic motion analysis system using a sensor KINECT

Approximation of distance between locations on earth given by latitude and longitude

The Study of Evolutionary Change of Shogi

C F E E E F FF E F B F F A EA C AEC

RMT Tick RMT. Application of the RMT-test on Real Data: Hash Function and Tick Data of Stock Prices

Vol. 31,No JOURNAL OF CHINA UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb

HW 3 Solutions 1. a) I use the auto.arima R function to search over models using AIC and decide on an ARMA(3,1)

Reaction of a Platinum Electrode for the Measurement of Redox Potential of Paddy Soil

,,, (, ) , ;,,, ; -

HDD. Point to Point. Fig. 1: Difference of time response by location of zero.

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2014-MUS-104 No /8/26 1,a) Music Structure and Composition with Sound Directivity in 3D Space

Queensland University of Technology Transport Data Analysis and Modeling Methodologies

EM Baum-Welch. Step by Step the Baum-Welch Algorithm and its Application 2. HMM Baum-Welch. Baum-Welch. Baum-Welch Baum-Welch.

ΕΙΣΑΓΩΓΗ ΣΤΗ ΣΤΑΤΙΣΤΙΚΗ ΑΝΑΛΥΣΗ

Automatic extraction of bibliography with machine learning

Analysis of the Low-Frequency Ferrite Antenna. Kazuaki Abe (CASIO COMPUTER CO., LTD.), Jun-ichi Takada (Tokyo Institute of Technology)

ΕΥΡΕΣΗ ΤΟΥ ΔΙΑΝΥΣΜΑΤΟΣ ΘΕΣΗΣ ΚΙΝΟΥΜΕΝΟΥ ΡΟΜΠΟΤ ΜΕ ΜΟΝΟΦΘΑΛΜΟ ΣΥΣΤΗΜΑ ΟΡΑΣΗΣ

Other Test Constructions: Likelihood Ratio & Bayes Tests

ΠΟΛΥΤΕΧΝΕΙΟ ΚΡΗΤΗΣ ΣΧΟΛΗ ΜΗΧΑΝΙΚΩΝ ΠΕΡΙΒΑΛΛΟΝΤΟΣ


Orthogonalization Library with a Numerical Computation Policy Interface

( ) , ) , ; kg 1) 80 % kg. Vol. 28,No. 1 Jan.,2006 RESOURCES SCIENCE : (2006) ,2 ,,,, ; ;

Yoshifumi Moriyama 1,a) Ichiro Iimura 2,b) Tomotsugu Ohno 1,c) Shigeru Nakayama 3,d)

Quick algorithm f or computing core attribute

Supplementary figures

Echo path identification for stereophonic acoustic echo cancellation without pre-processing

Technical Research Report, Earthquake Research Institute, the University of Tokyo, No. +-, pp. 0 +3,,**1. No ,**1

Comparison of Evapotranspiration between Indigenous Vegetation and Invading Vegetation in a Bog

«ΑΝΑΠΣΤΞΖ ΓΠ ΚΑΗ ΥΩΡΗΚΖ ΑΝΑΛΤΖ ΜΔΣΔΩΡΟΛΟΓΗΚΩΝ ΓΔΓΟΜΔΝΩΝ ΣΟΝ ΔΛΛΑΓΗΚΟ ΥΩΡΟ»

Re-Pair n. Re-Pair. Re-Pair. Re-Pair. Re-Pair. (Re-Merge) Re-Merge. Sekine [4, 5, 8] (highly repetitive text) [2] Re-Pair. Blocked-Repair-VF [7]

Medium Data on Big Data

1 n-gram n-gram n-gram [11], [15] n-best [16] n-gram. n-gram. 1,a) Graham Neubig 1,b) Sakriani Sakti 1,c) 1,d) 1,e)

Quantum dot sensitized solar cells with efficiency over 12% based on tetraethyl orthosilicate additive in polysulfide electrolyte

ΜΕΤΑΠΤΥΧΙΑΚΟ ΠΡΟΓΡΑΜΜΑ ΣΠΟΥΔΩΝ

Web 論 文. Performance Evaluation and Renewal of Department s Official Web Site. Akira TAKAHASHI and Kenji KAMIMURA

Design Method of Ball Mill by Discrete Element Method

IF(Ingerchange Format) [7] IF C-STAR(Consortium for speech translation advanced research ) [8] IF 2 IF

Company. Patras, Greece

Applying Markov Decision Processes to Role-playing Game

Άσκηση 10, σελ Για τη μεταβλητή x (άτυπος όγκος) έχουμε: x censored_x 1 F 3 F 3 F 4 F 10 F 13 F 13 F 16 F 16 F 24 F 26 F 27 F 28 F

[2] T.S.G. Peiris and R.O. Thattil, An Alternative Model to Estimate Solar Radiation

5.4 The Poisson Distribution.

Ι ΑΚΤΟΡΙΚΗ ΙΑΤΡΙΒΗ. Χρήστος Αθ. Χριστοδούλου. Επιβλέπων: Καθηγητής Ιωάννης Αθ. Σταθόπουλος

Content. Introduction... 1

Detection and Recognition of Traffic Signal Using Machine Learning

ΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ. ΘΕΜΑ: «ιερεύνηση της σχέσης µεταξύ φωνηµικής επίγνωσης και ορθογραφικής δεξιότητας σε παιδιά προσχολικής ηλικίας»

Robust Feature Extraction Method Based on Run-Length Compensation for Degraded Character Recognition

ΜΑΘΗΜΑΤΙΚΗ ΠΡΟΣΟΜΟΙΩΣΗ ΤΗΣ ΔΥΝΑΜΙΚΗΣ ΤΟΥ ΕΔΑΦΙΚΟΥ ΝΕΡΟΥ ΣΤΗΝ ΠΕΡΙΠΤΩΣΗ ΑΡΔΕΥΣΗΣ ΜΕ ΥΠΟΓΕΙΟΥΣ ΣΤΑΛΑΚΤΗΦΟΡΟΥΣ ΣΩΛΗΝΕΣ ΣΕ ΔΙΑΣΤΡΩΜΕΝΑ ΕΔΑΦΗ

Διπλωματική Εργασία. του φοιτητή του Τμήματος Ηλεκτρολόγων Μηχανικών και Τεχνολογίας Υπολογιστών της Πολυτεχνικής Σχολής του Πανεπιστημίου Πατρών

Information and Communication Technologies in Education

Transcript:

Vol.21-SLP-83 No.9 21/1/29 1 Estimation, Evaluation and Guarantee of the Reverberant Speech Recognition Performance based on Room Acoustic Parameters Takanobu Nishiura 1 We study on estimation, evaluation and guarantee of the reverberant speech recognition performance based on room acoustic parameters in this paper. We first designed the suitable reverberation criteria with the relation between room acoustic parameters and speech recognition performance. We then estimated, evaluated and guaranteed the speech recognition performance based on our designed reverberation criteria. As a result of evaluation experiments, we could confirm that the recognition performance could be accurately and robustly estimated, evaluated and guaranteed with proposed criteria. 1 College of Information Science and Engineering, Ritsumeikan University 1. 1965 M. R. Schroeder 1) 2) 2) RSR-D n (Reverberant Speech Recognition criteria with D n ) 2. 2.1 (T 6 ) (T 6 ) 3) 1 1-6 db 1 c 21 Information Processing Society of Japan

3. RSR-D n 3.1 4) ISO3382 Annex A 3.1.1 Definition(D ) C (Clarity), D (Definition) Ts(Centre time) 3 C D D D D (1) D n = n h 2 (t)dt/ h 2 (t)dt. (1) h(t) n D 3.2 RSR-D n D D RSR-D n (Reverberant Speech Recognition criteria with D n ) 3.2.1 RSR-D n RSR-D n 1 Step.1 1 1 A Design of Reverberation Criteria Impulse Responses in Training Impulse Response in Testing Reverberation Time D Value Recognition Performance Recognition Performance Estimation Reverberation Time D Value New Reverberation Criterion R.Time[ xxx ms ] R.Rate = function( D Value ) R.Time[ yyy ms ] R.Rate = function( D Value ) R.Time[ zzz ms ] R.Rate = function( D Value ) Estimated Recogniton Performance Vol.21-SLP-83 No.9 21/1/29 1 Fig. 1 Overview of the proposed method. Step.2 D Step.1 (1) D n D n 4.1.1 D n Step.3 Step.1 Step.4 Step.2 Step.3 D 1 2 1 1 2 3.3 RSR-D n 1 2 c 21 Information Processing Society of Japan

Fig. 2 Table 1 1 Regression curve and parameters to estimate y = ax + b y = ax 2 + b Correlation Coefficients.965.96.955.95.945.94.935.93 a b.925 Linear Quadratic.92 1 2 3 4 5 6 7 8 9 Border Time [ms] 2 n The relation between correlation coefficient in each regression curve and border time n D RSR-D n D 4. D RSR-D n RSR-D n 4.1 2(A) 9 732 RIRs Room Impulse Responses 2 4.1.1 RSR-D n (1) n D 2 Table 2 Experimental conditions (A) Soundproof room (T 6 =1 ms 72 RIRs) Japanese style room (T 6 =4 ms 72 RIRs) Laboratory (T 6 =45 ms 72 RIRs) Environments Conference room (T 6 =6 ms 12 RIRs) in training Living room (T 6 =6 ms 72 RIRs) Corridor (T 6 =6 ms 12 RIRs) Bath room (T 6 =65 ms 28 RIRs) Elevator hall (T 6 =85 ms 12 RIRs) Standard stairs (T 6 =85 ms 56 RIRs) (B) Japanese style room (T 6 =4 ms 72 RIRs) Environments to Conference room (T 6 =6 ms 12 RIRs) calculate a suitable n Standard stairs (T 6 =85 ms 56 RIRs) (C) Japanese style room (T 6 =4 ms 72 RIRs) Environments Conference room (T 6 =6 ms 12 RIRs) to design RSR-D n Standard stairs (T 6 =85 ms 56 RIRs) (D) Laboratory (T 6 =45 ms 72 RIRs) Environments Bath room (T 6 =65 ms 28 RIRs) in testing Elevator hall (T 6 =85 ms 12 RIRs) Measured distance 1 5, mm ATR phoneme balance Speech 216 words 5) 7 female and 7 male speakers Decoder Julius 6) HMM IPA monophone model (Gender-dependent) Feature vectors 12 MFCC + 12 ΔMFCC + 1 ΔPower Frame length 25 ms (Hamming window) Frame interval 1 ms Vol.21-SLP-83 No.9 21/1/29 D n 2(B) 3 3.2.1 n 1 9 ms 1 ms D n D 3 n 2 1 2 n 2 ms 2(B) 3 RSR-D n n 2 ms 3 c 21 Information Processing Society of Japan

Vol.21-SLP-83 No.9 21/1/29 Table 3 3 Correlation coefficients RSR-D 2 L (Linear) RSR-D 2 Q (Quadratic) T 6 =4 ms.937.939 T 6 =6 ms.966.963 T 6 =85 ms.977.972 Average Estimation Error [%] 16 14 12 1 8 6 4 2 2.6 2.5 Conventional.9 3.47 Linear.92 3.45 Quadratic Average Estimation Error [%] 16 14 12 1 8 6 4 2 5.36 6.13 Conventional 1.98 2.9 2.1 2.62 Linear Quadratic Average Estimation Error [%] 16 14 12 1 8 6 4 2 7.34 15.32 Conventional 1.92 4.85 Linear 2.22 4.58 Quadratic 1.2 Sound Proof Room (1ms) Elevator Hall (85ms) 1.2 (a) (b) (c) 1.8 Japanese Style Room (4ms) Laboratory (45ms) Conference Room (6ms) Living Room (6ms) Corridor(6ms) Bath Room (65ms) Standard stairs (85ms) 1.8 5 Fig. 5 Average error D2 D2.6.4.2 4 5 6 7 8 9 1 D2.6.4.2 Sound Proof Room (1ms) Corridor(6ms) Japanese Style Room (4ms) Bath Room (65ms) Laboratory (45ms) Elevator Hall (85ms) Conference Room (6ms) Standard stairs (85ms) Living Room (6ms) 7 75 8 85 9 (a) (Overall) (b) (Closeup) 3 D 2 Fig. 3 The relation between D 2 and speech recognition performance RSR-D2L (Linear) RSR-D2Q (Quadratic) 1..8.6.4.2 4 5 6 7 8 9 1 D2 RSR-D2L (Linear) 1 RSR-D2Q (Quadratic).8.6.4.2 4 5 6 7 8 9 1 D2 RSR-D2L (Linear) 1 RSR-D2Q (Quadratic).8.6.4.2 4 5 6 7 8 9 1 (a) (T 6 =4 ms) (b) (T 6 =6 ms) (c) (T 6 =85 ms) 4 RSR-D 2 Fig. 4 The relation between RSR-D 2 and speech recognition performance n=2 ms D (D 2 ) RSR-D 2 4.2 RSR-D 2 2(A) 9 D 2 3(a) 3(b) 9 2(C) 3 4 Table 4 Standard deviation Conventional RSR-D 2 L RSR-D 2 Q Method (Linear) (Quadratic) Close Open Close Open Close Open T 6 =45 ms 3.1 3.26 1.1 3.62 1.13 3.6 T 6 =65 ms 6.92 7.18 2.46 3.49 2.59 3.14 T 6 =85 ms 8.8 17.64 2.41 5.35 2.81 5.23 4 3 3 D 2 1 RSR-D 2 L(Linear) 2 RSR-D 2 Q(Quadratic) (T 6 =6 ms) (T 6 =85 ms).96 (T 6 =4 ms).93 D 2 1 2 RSR-D 2 L RSR-D 2 Q 4.3 2(D) 3 4 c 21 Information Processing Society of Japan

2(D) 3 5 4 RSR-D n RSR-D n Q RSR-D n L D 2 2 RSR-D 2 Q 4.4 4.4.1 RSR-D 2 RSR-D 2 2(A) 9 D 2 3(b) 6 ms ( ) 4 45 ms 85 ms RSR-D 2 5. RSR-D n, RSR-D n. RSR-D n. 5 D 2 Table 5 D 2 of measured impulse response in each environment T 6 =45ms.48.63 T 6 =85ms.72.67 Vol.21-SLP-83 No.9 21/1/29 5.1 2(A) T 6 =45 ms T 6 =85 ms ( 5 ) 25 132 25 3 5 D 2 D 2 5.2 6 7 5.3 T 6 =45 ms T 6 =85 ms 5.1 D 2 D 2 5 c 21 Information Processing Society of Japan

Window Recognition performance [ % ] 1 9 8 7 132 TV 25 355 5 163 : Speaker : Microphone Door (T 6 =45 ms) Fig. 6 175 6 1 2 3 1 12 2 Distance between microphone and speaker [ ] 264 Door 7 EV EV EV : Speaker 6 Placement of microphone and speaker Distance between Mic-Speaker and Wall [ ] :25 :132 (T 6 =45 ms) Fig. 7 Recognition performance [ % ] 1 9 8 7 : Microphone 823 3 25 (T 6 =85 ms) 581 Distance between Mic-Speaker and Wall [ ] :25 :3 6 1 2 3 112 2 27 3 4 5 Distance between microphone and speaker [ ] (T 6 =85 ms) 7 Reverberant speech recognition performance RSR-D n, 6. 8) Vol.21-SLP-83 No.9 21/1/29 1) M. R. Schroeder, New Method of Measuring Reverberation Time, JASA, Vol. 37, pp. 49-412, 1965. 2) Rico Petrick, Xugang Lu, Masashi Unoki, Masato Akagi, and Ruediger Hoffmann, Robust Front End Processing for Speech Recognition in Reverberant Environments: Utilization of Speech Characteristics, Proc. Interspeech28, pp. 658-661, Brisbane, Australia, Sept. 28. 3),,, 23. 4) ISO3382:Acoustics-Measurement of the reverberation time of rooms with reference to other accoustical parameters. Internatinal Organization for Standardization, 1997. 5) K. Takeda, Y. Sagisaka, and S. Katagiri, Acoustic-Phonetic Labels in a Japanese Speech Database, Proc. European Conference on Speech Technology, vol. 2, pp. 13-16, Oct. 1987. 6) A. Lee, T. Kawahara, and K. Shikano, Julius an open source real-time large vocabulary recognition engine, In Proc. European Conf. on Speech Communication and Technology, pp. 1691-1694, 21. 7) T. Houtgast, H. J. M. Steeneken, and R. Plomp, Predicting speech intelligibility in room acoustics, Acustica, vol. 46, pp. 6-72, 198. 8) T. Yamada, M. Kumakura, N. Kitawaki, Performance estimation of speech recognition system under noise conditions using objective quality measures and artificial voice, IEEE Trans. on ASLP, Vol. 14, No. 6, pp. 26-213, Nov. 26. RSR-D n MTF(Modulation Transfer Function) 7) PESQ 6 c 21 Information Processing Society of Japan