3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

Σχετικά έγγραφα
Buried Markov Model Pairwise

[5] F 16.1% MFCC NMF D-CASE 17 [5] NMF NMF 3. [5] 1 NMF Deep Neural Network(DNN) FUSION 3.1 NMF NMF [12] S W H 1 Fig. 1 Our aoustic event detect

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

2.1 C (l, θ, ϕ) C i (i = 1, 2,, M) L i i V s τ i (l, θ, ϕ) = l L i(l, θ, ϕ) V s (1) t i x i (t) C DSBF (2) s c (t) = 1 M M x i (t + τ i ) (2) i=1 ( )

Fourier transform, STFT 5. Continuous wavelet transform, CWT STFT STFT STFT STFT [1] CWT CWT CWT STFT [2 5] CWT STFT STFT CWT CWT. Griffin [8] CWT CWT

MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)


Development of a Seismic Data Analysis System for a Short-term Training for Researchers from Developing Countries

No. 7 Modular Machine Tool & Automatic Manufacturing Technique. Jul TH166 TG659 A

Research on Economics and Management

Στοιχεία εισηγητή Ημερομηνία: 10/10/2017

Re-Pair n. Re-Pair. Re-Pair. Re-Pair. Re-Pair. (Re-Merge) Re-Merge. Sekine [4, 5, 8] (highly repetitive text) [2] Re-Pair. Blocked-Repair-VF [7]

Scrub Nurse Robot: SNR. C++ SNR Uppaal TA SNR SNR. Vain SNR. Uppaal TA. TA state Uppaal TA location. Uppaal

Area Location and Recognition of Video Text Based on Depth Learning Method

Anomaly Detection with Neighborhood Preservation Principle

HOSVD. Higher Order Data Classification Method with Autocorrelation Matrix Correcting on HOSVD. Junichi MORIGAKI and Kaoru KATAYAMA

JOURNAL OF APPLIED SCIENCES Electronics and Information Engineering. Cyclic MUSIC DOA TN (2012)

1 (forward modeling) 2 (data-driven modeling) e- Quest EnergyPlus DeST 1.1. {X t } ARMA. S.Sp. Pappas [4]

Automatic extraction of bibliography with machine learning

Detection and Recognition of Traffic Signal Using Machine Learning

Vol.4-DCC-8 No.8 Vol.4-MUS-5 No.8 4// 3 3 Hanning (T ) 3 Hanning 3T (y(t)w(t)) dt =.5 T y (t)dt. () STRAIGHT F 3 TANDEM-STRAIGHT[] 3 F F 3 [] F []. :

Retrieval of Seismic Data Recorded on Open-reel-type Magnetic Tapes (MT) by Using Existing Devices

FX10 SIMD SIMD. [3] Dekker [4] IEEE754. a.lo. (SpMV Sparse matrix and vector product) IEEE754 IEEE754 [5] Double-Double Knuth FMA FMA FX10 FMA SIMD

2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems

Zigbee. Zigbee. Zigbee Zigbee ZigBee. ZigBee. ZigBee

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

A Method for Creating Shortcut Links by Considering Popularity of Contents in Structured P2P Networks

Stabilization of stock price prediction by cross entropy optimization

[1] DNA ATM [2] c 2013 Information Processing Society of Japan. Gait motion descriptors. Osaka University 2. Drexel University a)

Maxima SCORM. Algebraic Manipulations and Visualizing Graphs in SCORM contents by Maxima and Mashup Approach. Jia Yunpeng, 1 Takayuki Nagai, 2, 1

ΔΙΠΛΩΜΑΤΙΚΕΣ ΕΡΓΑΣΙΕΣ

Technical Research Report, Earthquake Research Institute, the University of Tokyo, No. +-, pp. 0 +3,,**1. No ,**1

GPGPU. Grover. On Large Scale Simulation of Grover s Algorithm by Using GPGPU

CSJ. Speaker clustering based on non-negative matrix factorization using i-vector-based speaker similarity

Optimization, PSO) DE [1, 2, 3, 4] PSO [5, 6, 7, 8, 9, 10, 11] (P)

EM Baum-Welch. Step by Step the Baum-Welch Algorithm and its Application 2. HMM Baum-Welch. Baum-Welch. Baum-Welch Baum-Welch.

ΕΥΡΕΣΗ ΤΟΥ ΔΙΑΝΥΣΜΑΤΟΣ ΘΕΣΗΣ ΚΙΝΟΥΜΕΝΟΥ ΡΟΜΠΟΤ ΜΕ ΜΟΝΟΦΘΑΛΜΟ ΣΥΣΤΗΜΑ ΟΡΑΣΗΣ

Πτυχιακή Εργασι α «Εκτι μήσή τής ποιο τήτας εικο νων με τήν χρή σή τεχνήτων νευρωνικων δικτυ ων»

Estimation, Evaluation and Guarantee of the Reverberant Speech Recognition Performance based on Room Acoustic Parameters

Διπλωματική Εργασία της φοιτήτριας του Τμήματος Ηλεκτρολόγων Μηχανικών και Τεχνολογίας Υπολογιστών της Πολυτεχνικής Σχολής του Πανεπιστημίου Πατρών

Vol. 31,No JOURNAL OF CHINA UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb

ΔΙΠΛΩΜΑΤΙΚΕΣ ΕΡΓΑΣΙΕΣ ΠΜΣ «ΠΛΗΡΟΦΟΡΙΚΗ & ΕΠΙΚΟΙΝΩΝΙΕΣ» OSWINDS RESEARCH GROUP

Nov Journal of Zhengzhou University Engineering Science Vol. 36 No FCM. A doi /j. issn

[2] REVERB 8 [3], [4] [5] [20] [6], [7], [8], [9], [10] [11] REVERB 8 *1 [9] LDA *2 MLLT (SAT) [8] (basis fmllr) [12] (DNN) [10] DNN [11] [13] [14] Ka

ΔΙΠΛΩΜΑΤΙΚΕΣ ΕΡΓΑΣΙΕΣ ΠΜΣ «ΠΛΗΡΟΦΟΡΙΚΗ & ΕΠΙΚΟΙΝΩΝΙΕς» OSWINDS RESEARCH GROUP

Resurvey of Possible Seismic Fissures in the Old-Edo River in Tokyo

Supplementary Materials for Evolutionary Multiobjective Optimization Based Multimodal Optimization: Fitness Landscape Approximation and Peak Detection

CorV CVAC. CorV TU317. 1

ΓΙΑΝΝΟΥΛΑ Σ. ΦΛΩΡΟΥ Ι ΑΚΤΟΡΑΣ ΤΟΥ ΤΜΗΜΑΤΟΣ ΕΦΑΡΜΟΣΜΕΝΗΣ ΠΛΗΡΟΦΟΡΙΚΗΣ ΤΟΥ ΠΑΝΕΠΙΣΤΗΜΙΟΥ ΜΑΚΕ ΟΝΙΑΣ ΒΙΟΓΡΑΦΙΚΟ ΣΗΜΕΙΩΜΑ

Non-negative Matrix Factorization, NMF [5] NMF. [1 3] Bregman [4] Harmonic-Temporal Clustering, HTC [2,3] 1,2,b) NTT

{takasu, Conditional Random Field

VSC STEADY2STATE MOD EL AND ITS NONL INEAR CONTROL OF VSC2HVDC SYSTEM VSC (1. , ; 2. , )

(Υπογραϕή) (Υπογραϕή) (Υπογραϕή)

3.8.1 J (7) (1883~1906) (1907~1931) A ~ (10) i J C-1 ~1973 C-2

Δημήτριος Θ. Τόμτσης, Ph.D. Αναλυτικό Βιογραφικό Σημείωμα

Creative TEchnology Provider

SocialDict. A reading support tool with prediction capability and its extension to readability measurement

n 1 n 3 choice node (shelf) choice node (rough group) choice node (representative candidate)

[4] 1.2 [5] Bayesian Approach min-max min-max [6] UCB(Upper Confidence Bound ) UCT [7] [1] ( ) Amazons[8] Lines of Action(LOA)[4] Winands [4] 1

Medium Data on Big Data

Big Data/Business Intelligence

Adaptive grouping difference variation wolf pack algorithm

Development of the Nursing Program for Rehabilitation of Woman Diagnosed with Breast Cancer

Study of In-vehicle Sound Field Creation by Simultaneous Equation Method

FEATURES APPLICATION PRODUCT T IDENTIFICATION PRODUCT T DIMENSION MAG.LAYERS

Traversing Assist System for Tracked Vehicles on Rough Terrain Based on Continuous Three-Dimensional Terrain-Scanning

ΒΙΟΓΡΑΦΙΚΟ ΣΗΜΕΙΩΜΑ Δρ. ΣΩΤΗΡΙΟΣ Α. ΔΑΛΙΑΝΗΣ

Quantitative chemical analyses of rocks with X-ray fluorescence analyzer: major and trace elements in ultrabasic rocks

Επεξεπγαζία Ήσος Φυνήρ 4 η Διάλεξη ΦΗΦΙΑΚΟ ΣΟΤΝΣΙΟ

Echo path identification for stereophonic acoustic echo cancellation without pre-processing

Reading Order Detection for Text Layout Excluded by Image

ΜΕΤΑΠΤΥΧΙΑΚΟ ΠΡΟΓΡΑΜΜΑ ΣΠΟΥΔΩΝ

Secure Cyberspace: New Defense Capabilities

ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ ΣΧΟΛΗ ΗΛΕΚΤΡΟΛΟΓΩΝ ΜΗΧΑΝΙΚΩΝ ΚΑΙ ΜΗΧΑΝΙΚΩΝ ΥΠΟΛΟΓΙΣΤΩΝ

ΒΙΟΓΡΑΦΙΚΟ ΣΗΜΕΙΩΜΑ ΣΤΥΛΙΑΝΗΣ Κ. ΣΟΦΙΑΝΟΠΟΥΛΟΥ Αναπληρώτρια Καθηγήτρια. Τµήµα Τεχνολογίας & Συστηµάτων Παραγωγής.

Query by Phrase (QBP) (Music Information Retrieval, MIR) QBH QBP / [1, 2] [3, 4] Query-by-Humming (QBH) QBP MIDI [5, 6] [8 10] [7]

Identifying Scenes with the Same Person in Video Content on the Basis of Scene Continuity and Face Similarity Measurement

Ηρϊκλειτοσ ΙΙ. Πανεπιζηήμιο Θεζζαλίας. Τμήμα Μηχανικών Η/Υ και Δικτύων

Εφαρμογή Υπολογιστικών Τεχνικών στην Γεωργία

From Secure e-computing to Trusted u-computing. Dimitris Gritzalis

An Advanced Manipulation for Space Redundant Macro-Micro Manipulator System

GUI

Περιεχόµενα. ΕΠΛ 422: Συστήµατα Πολυµέσων. Μέθοδοι συµπίεσης ηχητικών. Βιβλιογραφία. Κωδικοποίηση µε βάση την αντίληψη.

A Convolutional Neural Network Approach for Objective Video Quality Assessment

A Method of Trajectory Tracking Control for Nonminimum Phase Continuous Time Systems

το προφίλ της γάστρας, η ίσαλος σχεδίασης, η καμπύλη εμβαδών εγκαρσίων τομών και η κατανομή του κέντρου βάρους των εγκαρσίων τομών κατά μήκος του

Toward a SPARQL Query Execution Mechanism using Dynamic Mapping Adaptation -A Preliminary Report- Takuya Adachi 1 Naoki Fukuta 2.

ΜΕΘΟΔΟΙ ΥΠΟΛΟΓΙΣΜΟΥ ΤΗΣ ΖΕΝΙΘΕΙΑΣ ΤΡΟΠΟΣΦΑΙΡΙΚΗΣ ΥΣΤΕΡΗΣΗΣ ΣΕ ΜΟΝΙΜΟΥΣ ΣΤΑΘΜΟΥΣ GNSS

1530 ( ) 2014,54(12),, E (, 1, X ) [4],,, α, T α, β,, T β, c, P(T β 1 T α,α, β,c) 1 1,,X X F, X E F X E X F X F E X E 1 [1-2] , 2 : X X 1 X 2 ;

* ** *** *** Jun S HIMADA*, Kyoko O HSUMI**, Kazuhiko O HBA*** and Atsushi M ARUYAMA***

Ανάπτυξη Δυνατοτήτων στην Εκπαίδευση μέσω της Πρωτοβουλίας GEO

Bayesian Discriminant Feature Selection

supporting phase aerial phase supporting phase z 2 z T z 1 p G quardic curve curve f 2, n 2 f 1, n 1 lift-off touch-down p Z

Ανάπτυξη Δυνατοτήτων στην Εκπαίδευση μέσω της Πρωτοβουλίας GEO

Δρ. Ηλίας Ξυδιάς Τηλ.: ,

GPU. CUDA GPU GeForce GTX 580 GPU 2.67GHz Intel Core 2 Duo CPU E7300 CUDA. Parallelizing the Number Partitioning Problem for GPUs

ER-Tree (Extended R*-Tree)

Ευφυές Σύστημα Ανάλυσης Εικόνων Μικροσκοπίου για την Ανίχνευση Παθολογικών Κυττάρων σε Εικόνες Τεστ ΠΑΠ

Μηχανουργική Τεχνολογία ΙΙ

Architecture for Visualization Using Teacher Information based on SOM

Transcript:

Sound Source Identification based on Deep Learning with Partially-Shared Architecture 1 2 1 1,3 Takayuki MORITO 1, Osamu SUGIYAMA 2, Ryosuke KOJIMA 1, Kazuhiro NAKADAI 1,3 1 2 ( ) 3 Tokyo Institute of Technology 1, Kyoto University 2, Honda Research Institute Japan Co., Ltd. 3 morito@cyb.mei.titech.ac.jp, sugiyama@kuhp.kyoto-u.ac.jp, kojima@cyb.mei.titech.ac.jp, nakadai@jp.honda-ri.com Abstract Partially Shared Deep Neural Network (PS-DNN) Partially Shared Convolutional Neural Network (PS-CNN) 1 Unmanned Aerial Vehicle (UAV) Signal-to-Noise (SN) SN MUltiple SIgnal Classification based on incremental Generalized Singular Value Decomposition (igsvd-music) [Ohata 14] Geometric Highorder Decorrelation-based Source Separation (GHDSS) [Nakajima 10] SN Convolutional Neural Network (CNN) [Lawrence 97] [Uemura 15] DNN [Hannun 14] DNN DNN CNN 2 Multi-Task Learning (MTL) [Caruana 97] Partially Shared Deep Neural Network (PS- DNN) CNN Partially Shared Convolutional Neural Network (PS-CNN) 12

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer of PS-DNN 2.1 Partially Shared Deep Neural Network PS-DNN 1 PS-DNN y i 1, y i 2, y i 3 2 i i+1 y i+1 1, y i+1 2, y i+1 3 (1) y i+1 1 W i 11 W i 12 0 y i 1 y i+1 2 = σ 0 W i 22 0 y i 2 y i+1 3 0 W i 32 W i 33 y i 3 + b i 1 b i 2 b i 3 (1) W i jk yi k yi+1 j b i j σ( ) PS-CNN CNN PS-DNN CNN 1 2 1 CNN 2 RGB 3 CNN 2 1 PS-CNN Fig. 3 PS-DNN PS-CNN CNN PS-CNN PS-DNN CNN CNN PS-DNN i K i,1 K i,2 K i,3 i [X (i,1) 1,, X (i,ki,3) 3 ] X (i,1) 1 = [x (i,1) 1,1,1,, x(i,1) 1,V,H ] i + 1 j V H 1 CNN 2 1 13

C (i+1,j) = [ 1,1,,,, V,H ] j 2 3 4 = σ ( K i,1 K i,2 + 1,v+s,h+t 2,v+s,h+t +b (i,j) ) (2) 4: source The layout of the microphones and the sound = σ ( K i,2 2,v+s,h+t +b (i,j) ) (3) = σ ( K i,2 K i,3 + 2,v+s,h+t 3,v+s,h+t +b (i,j) ) (4) m, n w k,s,t b (i,j) σ( ) 3 Python TensorFlow version 0.8.0 [Abadi 15] 4 DNN, PS- DNN, CNN, PS-CNN DNN, CNN PS-DNN PS-CNN 2 DNN, CNN PS-DNN, PS-CNN DCASE2016 [Mesaros 16] Acoustic scene classification 15 35100 5 1170 Wave 5 3.1-3.2 465 5: UAV (Parrot Bebop Drone) 3.1 8 5 SN SN SN 5 SN 0 db SNR = 20 log(s p /N p ) (5) S p, N p SN SN 14

1: Dimensions for the DNN Hidden layer Units 1 2000 2 1000 3 400 2: Dimensions for the PS-DNN Hidden Units layer Identify Shared Separate 3.2 1 1500 1500 1500 2 800 400 800 3 400 0 800 16 khz 512 sample (32 ms) 120 sample (7.5 ms) 63 Hz, 8 khz, 20 Honda Research Institute Japan Audition for Robots with Kyoro University (HARK) [Nakadai 10] 20 8 20 3200 PS-DNN, PS-CNN 400 3.3 1-4 3.2 3200 15 PS-DNN, PS-CNN 400 0 pre-training Adam Dropout drop rate 0.2 0 0.4 4: Dimensions for the PS-CNN Hidden Type Channels Size layer Identify Shared Separate 1 Conv 40 40 40 20 20 2 Pool 40 40 40 10 10 3 Conv 80 40 80 10 10 4 Pool 80 40 80 5 5 5 Full 400 0 800 1 1 5: Accuracy of Sound Source Identification DNN PS-DNN CNN PS-CNN 100% Avg. 55.88 56.27 55.79 56.75 S.E. 0.6756 0.5742 0.5348 0.5275 75% Avg. 54.07 54.57 54.36 55.09 S.E. 0.6599 0.6584 0.5055 0.3268 50% Avg. 51.63 51.91 51.95 52.71 S.E. 0.5786 0.6525 0.5332 0.5357 25% Avg. 47.84 48.04 48.35 48.74 S.E. 0.6477 0.6526 0.5566 0.5898 5 5 zero padding 2 2 100 10 epoch 3.4 5 6-9 5 (Avg.) (S.E.) 5 4 4 100%, 75%, 50%, 25% 5 DNN < PS-DNN CNN < PS-CNN t p p > 0.05 3: Dimensions for the CNN Hidden layer Type Channels Size 1 Conv 40 20 20 2 Pool 40 10 10 3 Conv 80 10 10 4 Pool 80 5 5 5 Full 400 1 1 6: Trained with 100% annotated data 15

7: Trained with 75% annotated data 9: Trained with 25% annotated data DNN, CNN JSPS 24220006, 16H02884, 16K00294 JST ImPACT 8: Trained with 50% annotated data SN DCASE2016 Acoustic scene classification 4 SN [Abadi 15] Abadi, M., et al.: TensorFlow: Large- Scale Machine Learning on Heterogeneous Systems, http://tensorflow.org/ (2015) [Caruana 97] Caruana, R., et al.: Multitask learning, Machine Learning, vol.28, no. 1, pp. 41-75 (1997) [Mesaros 16] Mesaros, A., et al.: TUT database for acoustic scene classification and sound event detection, 24th Acoustic Scene Classification Workshop 2016 European Signal Processing Conference (EU- SIPCO) (2016) [Hannun 14] Hannun, A., et al.: Deepspeech: Scaling up end-to-end speech recognition, arxiv preprint arxiv:1412.5567 (2014) [Lawrence 97] Lawrence, S., et al.: Face recognition: A convolutional neural-network approach, IEEE Transactions on Neural Networks, vol. 8, no. 1, pp. 98-113 (1997) 16

[Nakadai 10] Nakadai, K., et al.: Design and Implementation of Robot Audition System HARK, Advanced Robotics, vol. 24, pp. 739-761 (2010) [Nakajima 10] Nakajima, H., et al.: Correlation matrix estimation by an optimally controlled recursive average method and its application to blind source separation, Acoustical Science and Technology, vol. 31, no. 3, pp. 205212 (2010) [Ohata 14] Ohata, T., et al.: Inprovement in outdoor sound source detection using a quadrotor-embedded microphone array, IEEE/RSJ International Conference on Intelligent Robots and Systems (2014). [Uemura 15] Uemura, S., et al.: Outdoor acoustic event identification using sound source separation and deep learning with a quadrotor-embedded microphone array, The 6th International Conference on Advanced Mechatronics (2015) 17