[5] F 16.1% MFCC NMF D-CASE 17 [5] NMF NMF 3. [5] 1 NMF Deep Neural Network(DNN) FUSION 3.1 NMF NMF [12] S W H 1 Fig. 1 Our aoustic event detect

Σχετικά έγγραφα
CSJ. Speaker clustering based on non-negative matrix factorization using i-vector-based speaker similarity

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

[2] REVERB 8 [3], [4] [5] [20] [6], [7], [8], [9], [10] [11] REVERB 8 *1 [9] LDA *2 MLLT (SAT) [8] (basis fmllr) [12] (DNN) [10] DNN [11] [13] [14] Ka

Buried Markov Model Pairwise

Συνδυασμένη Οπτική-Ακουστική Ανάλυση Ομιλίας

Estimation, Evaluation and Guarantee of the Reverberant Speech Recognition Performance based on Room Acoustic Parameters

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

Nov Journal of Zhengzhou University Engineering Science Vol. 36 No FCM. A doi /j. issn

Detection and Recognition of Traffic Signal Using Machine Learning

Bayesian Discriminant Feature Selection

Query by Phrase (QBP) (Music Information Retrieval, MIR) QBH QBP / [1, 2] [3, 4] Query-by-Humming (QBH) QBP MIDI [5, 6] [8 10] [7]

Anomaly Detection with Neighborhood Preservation Principle

ER-Tree (Extended R*-Tree)

Stabilization of stock price prediction by cross entropy optimization

40 3 Journal of South China University of Technology Vol. 40 No Natural Science Edition March

1,a) 1,b) 2 3 Sakriani Sakti 1 Graham Neubig 1 1. A Study on HMM-Based Speech Synthesis Using Rich Context Models

MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)

1 (forward modeling) 2 (data-driven modeling) e- Quest EnergyPlus DeST 1.1. {X t } ARMA. S.Sp. Pappas [4]

Speech Recognition using Phase Information based on Long-Term Analysis

Ψηφιακή Επεξεργασία Φωνής

Vol.4-DCC-8 No.8 Vol.4-MUS-5 No.8 4// 3 3 Hanning (T ) 3 Hanning 3T (y(t)w(t)) dt =.5 T y (t)dt. () STRAIGHT F 3 TANDEM-STRAIGHT[] 3 F F 3 [] F []. :

Quick algorithm f or computing core attribute

Identifying Scenes with the Same Person in Video Content on the Basis of Scene Continuity and Face Similarity Measurement

SVM. Research on ERPs feature extraction and classification

GUI

Yoshifumi Moriyama 1,a) Ichiro Iimura 2,b) Tomotsugu Ohno 1,c) Shigeru Nakayama 3,d)

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

1181 (real-timespeechdriven) 1 1 ( ) D FAP FAP (voiceactivationdetectionvad) D FaceGen 3- D XfaceEd MPEG-4 1 FAP 66 FAP ( ) FAP 84

SNR F0 [2], [3], [4] F0 F0 F0 F0 F0 TUSK F0 TUSK F0 6 TUSK 6 F0 2. F0 F0 [5] [6] [7] p[8] Cepstrum [9], [10] [11] [12] [13] F0 [14] F0 [15] DIO[16] [1

Re-Pair n. Re-Pair. Re-Pair. Re-Pair. Re-Pair. (Re-Merge) Re-Merge. Sekine [4, 5, 8] (highly repetitive text) [2] Re-Pair. Blocked-Repair-VF [7]

Fourier transform, STFT 5. Continuous wavelet transform, CWT STFT STFT STFT STFT [1] CWT CWT CWT STFT [2 5] CWT STFT STFT CWT CWT. Griffin [8] CWT CWT

Voice Conversion based on Non-negative Matrix Factorization with Segment Features in Noisy Environments

Development of a Seismic Data Analysis System for a Short-term Training for Researchers from Developing Countries

Study on Re-adhesion control by monitoring excessive angular momentum in electric railway traction

Η Διαδραστική Τηλεδιάσκεψη στο Σύγχρονο Σχολείο: Πλαίσιο Διδακτικού Σχεδιασμού

Kernel Methods and their Application for Image Understanding

Automatic extraction of bibliography with machine learning

1 n-gram n-gram n-gram [11], [15] n-best [16] n-gram. n-gram. 1,a) Graham Neubig 1,b) Sakriani Sakti 1,c) 1,d) 1,e)

GPU. CUDA GPU GeForce GTX 580 GPU 2.67GHz Intel Core 2 Duo CPU E7300 CUDA. Parallelizing the Number Partitioning Problem for GPUs

Indexing Methods for Encrypted Vector Databases

{takasu, Conditional Random Field

Non-negative Matrix Factorization, NMF [5] NMF. [1 3] Bregman [4] Harmonic-Temporal Clustering, HTC [2,3] 1,2,b) NTT

Bundle Adjustment for 3-D Reconstruction: Implementation and Evaluation

EM Baum-Welch. Step by Step the Baum-Welch Algorithm and its Application 2. HMM Baum-Welch. Baum-Welch. Baum-Welch Baum-Welch.

ΠΑΡΑΔΟΤΕΟ 3.1 : Έκθεση καταγραφής χρήσεων γης

Numerical Analysis FMN011

Ανάκτηση Εικόνας βάσει Υφής με χρήση Eye Tracker

ΑΡΧΙΜΗ ΗΣ - ΕΝΙΣΧΥΣΗ ΕΡΕΥΝΗΤΙΚΩΝ ΟΜΑ ΩΝ ΣΤΑ ΤΕΙ. Υποέργο: «Ανάκτηση και προστασία πνευµατικών δικαιωµάτων σε δεδοµένα

ΨΗΦΙΑΚΗ ΕΠΕΞΕΡΓΑΣΙΑ ΕΙΚΟΝΑΣ

GPGPU. Grover. On Large Scale Simulation of Grover s Algorithm by Using GPGPU

High order interpolation function for surface contact problem

Πτυχιακή Εργασι α «Εκτι μήσή τής ποιο τήτας εικο νων με τήν χρή σή τεχνήτων νευρωνικων δικτυ ων»

A Bonus-Malus System as a Markov Set-Chain. Małgorzata Niemiec Warsaw School of Economics Institute of Econometrics

Retrieval of Seismic Data Recorded on Open-reel-type Magnetic Tapes (MT) by Using Existing Devices

ΑΠΟΔΟΤΙΚΗ ΑΠΟΤΙΜΗΣΗ ΕΡΩΤΗΣΕΩΝ OLAP Η ΜΕΤΑΠΤΥΧΙΑΚΗ ΕΡΓΑΣΙΑ ΕΞΕΙΔΙΚΕΥΣΗΣ. Υποβάλλεται στην

FX10 SIMD SIMD. [3] Dekker [4] IEEE754. a.lo. (SpMV Sparse matrix and vector product) IEEE754 IEEE754 [5] Double-Double Knuth FMA FMA FX10 FMA SIMD

Area Location and Recognition of Video Text Based on Depth Learning Method

(Υπογραϕή) (Υπογραϕή) (Υπογραϕή)

Chapter 1 Introduction to Observational Studies Part 2 Cross-Sectional Selection Bias Adjustment

, -.

CorV CVAC. CorV TU317. 1

Optimization, PSO) DE [1, 2, 3, 4] PSO [5, 6, 7, 8, 9, 10, 11] (P)

Supplementary Materials for Evolutionary Multiobjective Optimization Based Multimodal Optimization: Fitness Landscape Approximation and Peak Detection

Διπλωματική Εργασία της φοιτήτριας του Τμήματος Ηλεκτρολόγων Μηχανικών και Τεχνολογίας Υπολογιστών της Πολυτεχνικής Σχολής του Πανεπιστημίου Πατρών

Evolution of Novel Studies on Thermofluid Dynamics with Combustion


2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems

Discriminative Language Modeling Based on Risk Minimization Training

ES440/ES911: CFD. Chapter 5. Solution of Linear Equation Systems

Schedulability Analysis Algorithm for Timing Constraint Workflow Models

Probabilistic Approach to Robust Optimization

ΠΠΜ 515: Προχωρηµένα Θέµατα Διεύθυνσης Κατασκευαστικών Έργων

[1] DNA ATM [2] c 2013 Information Processing Society of Japan. Gait motion descriptors. Osaka University 2. Drexel University a)

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review

Approximation of distance between locations on earth given by latitude and longitude

Queensland University of Technology Transport Data Analysis and Modeling Methodologies

Vol. 31,No JOURNAL OF CHINA UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb

Ψηφιακή Επεξεργασία Φωνής

Χαρακτηρισµός Νεοπλασµάτων στη Μαστογραφία από το Σχήµα της Παρυφής µε χρήση Νευρωνικών ικτύων

EPL 603 TOPICS IN SOFTWARE ENGINEERING. Lab 5: Component Adaptation Environment (COPE)

ΠΠΜ 515: Προχωρημένα Θέματα Διεύθυνσης Κατασκευαστικών

The Study of Evolutionary Change of Shogi

substructure similarity search using features in graph databases

Αξιολόγηση µεθόδων σύνθεσης εικόνων. Β. Τσαγκάρης και Β. Αναστασόπουλος

2 ~ 8 Hz Hz. Blondet 1 Trombetti 2-4 Symans 5. = - M p. M p. s 2 x p. s 2 x t x t. + C p. sx p. + K p. x p. C p. s 2. x tp x t.

BCI On Feature Extraction from Multi-Channel Brain Waves Used for Brain Computer Interface

ΕΥΡΕΣΗ ΤΟΥ ΔΙΑΝΥΣΜΑΤΟΣ ΘΕΣΗΣ ΚΙΝΟΥΜΕΝΟΥ ΡΟΜΠΟΤ ΜΕ ΜΟΝΟΦΘΑΛΜΟ ΣΥΣΤΗΜΑ ΟΡΑΣΗΣ

HIV HIV HIV HIV AIDS 3 :.1 /-,**1 +332

OLS. University of New South Wales, Australia

HOSVD. Higher Order Data Classification Method with Autocorrelation Matrix Correcting on HOSVD. Junichi MORIGAKI and Kaoru KATAYAMA

C F E E E F FF E F B F F A EA C AEC

Η Τηλεδιάσκεψη στην υπηρεσία της συνεργατικής οικοδόμησης της γνώσης και της διαθεματικής προσέγγισης. Από τη Θεωρία στην Πράξη:


ΒΙΟΓΡΑΦΙΚΟ ΣΗΜΕΙΩΜΑ ΛΕΩΝΙΔΑΣ Α. ΣΠΥΡΟΥ Διδακτορικό σε Υπολογιστική Εμβιομηχανική, Τμήμα Μηχανολόγων Μηχανικών, Πανεπιστήμιο Θεσσαλίας.

( ) (Harmonic-Temporal Clustering; HTC) [1], [2] ( ) ( ) [4] HTC. (Non-negative Matrix Factorization; NMF) [3] [5], [6] [7], [8]

A study of geometric dependency of cepstrum on vocal tract length

ΤΕΙ ΚΡΗΤΗΣ ΠΑΡΑΡΤΗΜΑ ΡΕΘΥΜΝΟΥ ΤΜΗΜΑ ΜΗΧΑΝΙΚΩΝ ΜΟΥΣΙΚΗΣ ΤΕΧΝΟΛΟΓΙΑΣ ΚΑΙ ΑΚΟΥΣΤΙΚΗΣ Αυτόματη αναγνώριση ηχοτοπίων και ηχητικών γεγονότων σε περιβάλλον π

Research on Economics and Management

A Method of Trajectory Tracking Control for Nonminimum Phase Continuous Time Systems

Legal use of personal data to fight telecom fraud

Transcript:

NMF 1 1,a) 1 AED NMF DNN IEEE D-CASE 2012 20% DNN NMF 1. Computational Auditory Scene Analysis: CASA [1] [2] [3] [4] [5] Non-negative Matrxi Factorization (NMF) NMF 2. CASA IEEE 1 Dept. Computer Science and Engineering, Toyohashi University of Technology, Toyohashi, Aichi 441-8580, Japan a) kyama@tut.jp D-CASE [6] D-CASE 2 OL (Office Live) OS (Office Synthetic) Vuegen [8] MFCC (Mel-Frequency Cepstral Coefficients) GMM (Gaussian Mixture Model) OL 43.4% OS 13.5% F Gemmeke [9] Exemplar-based NMF OL 31.4% OS 21.3% F 2 OL 1

[5] F 16.1% MFCC NMF D-CASE 17 [5] NMF NMF 3. [5] 1 NMF Deep Neural Network(DNN) FUSION 3.1 NMF 3.1.1 NMF [12] S W H 1 Fig. 1 Our aoustic event detection system S Ŝ = WH I(i =1, 2,,I) J(j =1, 2,,J) K(k =1, 2,,K) S Ŝ I J W I K H K J W n W W =[W 1,W 2,...,W N ] N W n H n W H S N n W nh n (n =1, 2,,N) N S n S n S ˆ W n H n n = S N m W (1) mh m [13] VQ ( ) 3.1.2 NMF 2 3 3.1.1 NMF W VQ VQ 2

4 Fig. 4 Score fusion 2 (1) Fig. 2 Shared bases (1) number of bases for each class is variable j j i C j N O i M i X i = exp(o i) M j exp(o j ) (3) 3 (2) Fig. 3 Shared bases (2) number of bases for each class is constant W (1) 2 (2) 3 2 Komatsu [14] 3.2 DNN j j O j N O j = W ij h i + C j (2) i=1 h i i W ij O i X i 4 MFCC FBANK 2 DNN 2 FUSION 3.3 (a) DNN NMF (1) DNN DNN 5 DNN (sil) X sil DNN O j DNN (3) (2) OL 3

5 Fig. 5 Acoustic event detection and threshold 9 DNN DNN 1 (b) NMF NMF NMF 4. [5] NMF 6 4 4 4 NMF 5. IEEE D-CASE [6] TASK2 OS 16 (alert, clearthroat, 6 Fig. 6 Distance between basis vectors within class and across classes cough, doorslam, drawer, keyboard, keys, knock, laughter, mouse, pageturn, pendrop, phone, printer, speech, switch) 20 15 / 44.1kHz/24bit OS 12 2 99, 981 15, 180 5.1 5.1.1 NMF NMF 20ms 10ms 1 0.97z 1 VQ 4 KL-divergence 64 4 8 (1) (2) 4 5 (1) 4 0.35, 0.40, 0.45, 0.50 8 0.40, 0.45, 0.50, 0.55 4

5.1.2 (SS) [15] SS 2.0 0.01 100 SS MFCC FBANK 20ms 10ms NMF MFCC 12 Δ, ΔΔ 1 39 33 FBANK 45 Δ, ΔΔ 135 5.1.3 DNN D-CASE SNR 10, 15, 20dB D-CASE OL DNN 3 7 MFCC 273 FBANK 945 5 512 256 128 64 32 17 16 + Rectified Linear Unit 5.1.4 D-CASE Challenge F 10ms F F [%]=2 P recison Recall P recision + Recall P recison[%] = C E, Recall[%] = C GT (4) (5) C E GT [5], [6], [8], [9], [10] F Recall 2 Recall R overlap F 12 12 1 Table 1 Results of revious works Baseline[6] DHV[10] GVV[9] VVK[8] F [%] 12.8 18.7 21.3 13.5 2 NMF DNN [5] Table 2 Results for only NMF and only DNN from [5] NMF DNN MFCC FBANK FUSION R overlap [%] 9.5 14.8 30.0 28.3 F [%] 14.4 27.3 33.0 36.5 5.2 1 2 2 NMF DNN [9] NMF NMF F 14.4% NMF 3 4 NMF DNN F 2 DNN MFCC 10.3% FBANK 4.4% FUSION 41.3% F F 8 0.45 11 laughter 17 alert, switch 5 4 NMF-DNN F 6. [5] NMF NMF 5

Table 3 3 -F - Sharing method -F-measure- - - Threshold cb size MFCC FBANK 0.35 4 36.7 35.9 0.40 4 36.2 35.4 0.45 4 35.7 33.8 0.50 4 36.3 36.4 0.40 8 29.6 35.3 0.45 8 34.2 37.4 0.50 8 32.0 36.0 0.55 8 37.6 36.5 - - Select cb size MFCC FBANK 4 4 30.7 31.2 4 8 30.7 31.2 5 4 29.6 34.0 5 8 28.1 24.8 4 Table 4 Comparison between conventional and proposed methods Method Feature R overlap[%] F [%] DNN[5] FUSION 28.3 36.5 NMF-DNN[5] Threhold = 0.45 cb size = 8 Threhold = 0.55 cb size = 8 MFCC 8.8 10.4 FBANK 23.1 20.6 FUSION 32.1 17.2 MFCC 31.4 34.2 FBANK 48.7 37.4 FUSION 39.9 41.3 MFCC 33.4 37.6 FBANK 36.3 36.5 FUSION 39.6 40.5 NMF DNN [5] DNN F 4.4% FBANK FUSION F 41.3% [9] 20% [5] FUSION 4.8% [1] A. Temko, et al., Comparison of sequence discriminant support vector machines for acoustic event classification, in Proc. IEEE ICASSP 2006, pp.721 724, 2006 [2] S. Deng, et al., Robust minimum statistic project coefficients feature for acoustic environment recognition, in Proc. IEEE ICASSP 2014, pp.8282 8266, 2014 [3] J. Salmon and J. P. Bello, Unsupervised feature learning for urban sound classification, in Proc. IEEE ICASSP 2015, pp.171 175, 2015. [4] R. Radhakrishnan, et al., Audio analysys for surveillance applications, in Proc. 2005 IEEE WASPAA, pp.158 161, 2005. [5],, 2016, 3-P-8, pp.169 172, 2016. [6] D. Stowell, D. Giannoulis, E. Benetos, M. Lagrange, and M. D. Plumbley, Detection and classification of acousitc scenes and events, in IEEE Trans. Multimedia, vol.17, pp.1733 1746, 2014. [7] N. Moreau, et al., Data collection for the CHIL CLEAR 2007 evaluation campaign, in Proc. LREC 08, pp.28 30, 2008. [8] L. Vuegen, et al., An MFCC - GMM approach apploach for event detection and classification, in IEEE AASP Challenge: Detection and Classification of Acoustic Scenes and Events, 2013. [Online]. Available: http://c4dm.eecs.qmul.ac.uk/ sceneseventschallenge/abstracts/ol/vvk.pdf [9] J. F. Gemmeke, et al., An exemplar-based NMF approach to audio event detection, in IEEE AASP Challenge: Detection and Classification of Acoustic Scenes and Events, 2013. [Online]. Available: http://c4dm.eecs.qmul.ac.uk/ sceneseventschallenge/abstracts/ol/gvv.pdf [10] A. Diment, et al., Sound event detection for office live and office synthetic AASP challenge, in IEEE AASP Challenge: Detection and Classification of Acoustic Scenes and Events, 2013. [Online]. Available: http://c4dm.eecs.qmul.ac.uk/ sceneseventschallenge/abstracts/ol/dhv.pdf [11] X. Zhou, et al., HMM-based acoustic event detection with AdaBoost feature selection, in Classification of Events, Activities and Relationships Evaluation and Workshop, 2007. [12] D. D. Lee, H. S. Seung, Algorithms for non-negative matrix factorization, in NIPS, pp.556 562, 2000. [13], NMF VQ, 2012, 1-P-18, pp.165 168, 2012. [14] T. Komatsu, et al., Acoustic event detection based on non-negative matrix factorization with mixtures of local dictionaries and activation aggregation, in Proc. ICASSP 2016, pp.2259 2263, 2016. [15] S. F. Boll, Suppression of acoustic noise in speech using spectral subtraction, in IEEE Trans. ASSP, Vol.27, No.2, pp.113 120, 1979. [16],,, Vol.68, No.2, pp.74 85, 2012. 6