Non-negative Matrix Factorization, NMF [5] NMF. [1 3] Bregman [4] Harmonic-Temporal Clustering, HTC [2,3] 1,2,b) NTT

Σχετικά έγγραφα
( ) (Harmonic-Temporal Clustering; HTC) [1], [2] ( ) ( ) [4] HTC. (Non-negative Matrix Factorization; NMF) [3] [5], [6] [7], [8]

Fourier transform, STFT 5. Continuous wavelet transform, CWT STFT STFT STFT STFT [1] CWT CWT CWT STFT [2 5] CWT STFT STFT CWT CWT. Griffin [8] CWT CWT

MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)

Query by Phrase (QBP) (Music Information Retrieval, MIR) QBH QBP / [1, 2] [3, 4] Query-by-Humming (QBH) QBP MIDI [5, 6] [8 10] [7]

Vol.4-DCC-8 No.8 Vol.4-MUS-5 No.8 4// 3 3 Hanning (T ) 3 Hanning 3T (y(t)w(t)) dt =.5 T y (t)dt. () STRAIGHT F 3 TANDEM-STRAIGHT[] 3 F F 3 [] F []. :

Parameter Estimation of Mixture Model of Multiple Instruments and Application to Musical Instrument Identification

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

CSJ. Speaker clustering based on non-negative matrix factorization using i-vector-based speaker similarity

GUI

[5] F 16.1% MFCC NMF D-CASE 17 [5] NMF NMF 3. [5] 1 NMF Deep Neural Network(DNN) FUSION 3.1 NMF NMF [12] S W H 1 Fig. 1 Our aoustic event detect

Acoustic Signal Adjustment by Considering Musical Expressive Intention Using a Performance Intension Function

SNR F0 [2], [3], [4] F0 F0 F0 F0 F0 TUSK F0 TUSK F0 6 TUSK 6 F0 2. F0 F0 [5] [6] [7] p[8] Cepstrum [9], [10] [11] [12] [13] F0 [14] F0 [15] DIO[16] [1

1181 (real-timespeechdriven) 1 1 ( ) D FAP FAP (voiceactivationdetectionvad) D FaceGen 3- D XfaceEd MPEG-4 1 FAP 66 FAP ( ) FAP 84


Buried Markov Model Pairwise

F0 Estimation of Melody and Bass Lines in Real-world Musical Audio Signals

VOCODER VOCODER Vocal

Signal processing for handling singing voice texture

: TANDEM-STRAIGHT. Make singing voice tangible: TANDEM-STRAIGHT and temporally variable morphing as substrate. Hideki Kawahara 1 and Masanori Morise 2

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

(Υπογραϕή) (Υπογραϕή) (Υπογραϕή)

Feasible Regions Defined by Stability Constraints Based on the Argument Principle

Voice Conversion based on Non-negative Matrix Factorization with Segment Features in Noisy Environments

Evolution of Novel Studies on Thermofluid Dynamics with Combustion

Anomaly Detection with Neighborhood Preservation Principle

Ψηφιακή Επεξεργασία Φωνής

2 ICA. (ICA, Independent Component Analysis) (PCA, Principal Compoenent Analysis) x(t) =(x 1 (t),...,x m (t)) T t =0, 1, 2,... PCA 2 ICA.

EM Baum-Welch. Step by Step the Baum-Welch Algorithm and its Application 2. HMM Baum-Welch. Baum-Welch. Baum-Welch Baum-Welch.

[1] DNA ATM [2] c 2013 Information Processing Society of Japan. Gait motion descriptors. Osaka University 2. Drexel University a)

A Vocabulary-Free Infinity-Gram Model for Chord Progression Analysis

Bundle Adjustment for 3-D Reconstruction: Implementation and Evaluation

2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems

Π Ο Λ Ι Τ Ι Κ Α Κ Α Ι Σ Τ Ρ Α Τ Ι Ω Τ Ι Κ Α Γ Ε Γ Ο Ν Ο Τ Α

Περιεχόµενα. ΕΠΛ 422: Συστήµατα Πολυµέσων. Μέθοδοι συµπίεσης ηχητικών. Βιβλιογραφία. Κωδικοποίηση µε βάση την αντίληψη.

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

ITU-R BT.1908 (2012/01) !" # $ %& '( ) * +, - ( )

The Algorithm to Extract Characteristic Chord Progression Extended the Sequential Pattern Mining

Applying Markov Decision Processes to Role-playing Game

Area Location and Recognition of Video Text Based on Depth Learning Method

{takasu, Conditional Random Field

Application of Wavelet Transform in Fundamental Study of Measurement of Blood Glucose Concentration with Near2Infrared Spectroscopy

Ψηφιακή Επεξεργασία Σημάτων


Analysis of prosodic features in native and non-native Japanese using generation process model of fundamental frequency contours

Συνδυασμένη Οπτική-Ακουστική Ανάλυση Ομιλίας

Re-Pair n. Re-Pair. Re-Pair. Re-Pair. Re-Pair. (Re-Merge) Re-Merge. Sekine [4, 5, 8] (highly repetitive text) [2] Re-Pair. Blocked-Repair-VF [7]

th International Conference on Machine Learning and Applications. E d. h. U h h b w k. b b f d h b f. h w k by v y

Estimation, Evaluation and Guarantee of the Reverberant Speech Recognition Performance based on Room Acoustic Parameters

HOSVD. Higher Order Data Classification Method with Autocorrelation Matrix Correcting on HOSVD. Junichi MORIGAKI and Kaoru KATAYAMA

Speech Recognition using Phase Information based on Long-Term Analysis

Vol.7 No (Mar. 2014) Latent Dirichlet Allocation LDA Twitter LDA

Bayesian modeling of inseparable space-time variation in disease risk

/MAC DoS. A Coding Scheme Using Matched Filter Resistant against DoS Attack to PHY/MAC Layer in Wireless Communications

Vol. 31,No JOURNAL OF CHINA UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb

6.003: Signals and Systems. Modulation

Retrieval of Seismic Data Recorded on Open-reel-type Magnetic Tapes (MT) by Using Existing Devices

[2] REVERB 8 [3], [4] [5] [20] [6], [7], [8], [9], [10] [11] REVERB 8 *1 [9] LDA *2 MLLT (SAT) [8] (basis fmllr) [12] (DNN) [10] DNN [11] [13] [14] Ka

Τεχνολογικό Εκπαιδευτικό Ίδρυμα Σερρών Τμήμα Πληροφορικής & Επικοινωνιών. Σήματα. και. Συστήματα

Ψηφιακή Επεξεργασία Φωνής

: Ω F F 0 t T P F 0 t T F 0 P Q. Merton 1974 XT T X T XT. T t. V t t X d T = XT [V t/t ]. τ 0 < τ < X d T = XT I {V τ T } δt XT I {V τ<t } I A

Ανάλυση, Περιγραφή και Ανάκτηση Μουσικών Δεδομένων: το έργο ΠΟΛΥΜΝΙΑ*

Fundamentals of Array Antennas

Yoshifumi Moriyama 1,a) Ichiro Iimura 2,b) Tomotsugu Ohno 1,c) Shigeru Nakayama 3,d)

1,a) 1,b) 2 3 Sakriani Sakti 1 Graham Neubig 1 1. A Study on HMM-Based Speech Synthesis Using Rich Context Models

Filter Diagonalization Method which Constructs an Approximation of Orthonormal Basis of the Invariant Subspace from the Filtered Vectors

Echo path identification for stereophonic acoustic echo cancellation without pre-processing

Japanese Fuzzy String Matching in Cooking Recipes

Εξάλειψη αντήχησης από ηχητικά σήματα με υποκειμενικά / ψυχοακουστικά κριτήρια

Spectrum Representation (5A) Young Won Lim 11/3/16

Prey-Taxis Holling-Tanner

J. of Math. (PRC) Banach, , X = N(T ) R(T + ), Y = R(T ) N(T + ). Vol. 37 ( 2017 ) No. 5

Quick algorithm f or computing core attribute

ΚΑΝΟΝΙΣΜΟΣ ΕΚΠΟΝΗΣΗΣ ΕΡΓΑΣΙΩΝ ΓΙΑ ΤΟ ΜΑΘΗΜΑ «ΕΠΕΞΕΡΓΑΣΙΑ ΨΗΦΙΑΚΟΥ ΣΗΜΑΤΟΣ ΚΑΙ ΣΧΕΔΙΑΣΜΟΣ ΥΛΙΚΟΥ»


ITU-R P ITU-R P (ITU-R 204/3 ( )

ECE Spring Prof. David R. Jackson ECE Dept. Notes 2

476,,. : 4. 7, MML. 4 6,.,. : ; Wishart ; MML Wishart ; CEM 2 ; ;,. 2. EM 2.1 Y = Y 1,, Y d T d, y = y 1,, y d T Y. k : p(y θ) = k α m p(y θ m ), (2.1

Χαρακτηρισµός Νεοπλασµάτων στη Μαστογραφία από το Σχήµα της Παρυφής µε χρήση Νευρωνικών ικτύων

TUNING FORK TUNES. exploring new scanning probe applications

Πτυχιακή Εργασι α «Εκτι μήσή τής ποιο τήτας εικο νων με τήν χρή σή τεχνήτων νευρωνικων δικτυ ων»

Stabilization of stock price prediction by cross entropy optimization

(Statistical Machine Translation: SMT[1]) [2]

Scrub Nurse Robot: SNR. C++ SNR Uppaal TA SNR SNR. Vain SNR. Uppaal TA. TA state Uppaal TA location. Uppaal


Web-based supplementary materials for Bayesian Quantile Regression for Ordinal Longitudinal Data

Abstract. Detection of Feature Points for Computer Vision. Harris. (feature point) (interest point) (corner) Moravec. Harris.

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2014-MUS-104 No /8/26 1,a) Music Structure and Composition with Sound Directivity in 3D Space

DOI /J. 1SSN

Contents Preliminary on Watermarking Technology Part I Signal Watermarking 2 Audio Watermarking

Διπλωματική Εργασία. του φοιτητή του Τμήματος Ηλεκτρολόγων Μηχανικών και Τεχνολογίας Υπολογιστών της Πολυτεχνικής Σχολής του Πανεπιστημίου Πατρών

BCI On Feature Extraction from Multi-Channel Brain Waves Used for Brain Computer Interface

Journal of Beijing University of Posts and Telecommunications. Blind CFR Estimation for SC2FDE Systems

Additional Results for the Pareto/NBD Model

ΔΙΠΛΩΜΑΤΙΚΕΣ ΕΡΓΑΣΙΕΣ


Α Ρ Ι Θ Μ Ο Σ : 6.913

Higher-Order Correlation Analysis of Pitch Fluctuations in Sustained Normal Vowels by the Method of Surrogate Data

1 (forward modeling) 2 (data-driven modeling) e- Quest EnergyPlus DeST 1.1. {X t } ARMA. S.Sp. Pappas [4]

Research on mode-locked optical fiber laser

ER-Tree (Extended R*-Tree)

Transcript:

1,a) 1,2,b) 1. [1 3] Bregman [4] Harmonic-Temporal Clustering, HTC [2,3] 1 7-3-1 113-0033 2 NTT 3-1 243-0198 a) Tomohio Naamura@ipc.i.u-toyo.ac.jp b) ameoa@hil.t.u-toyo.ac.jp/ameoa.hiroazu@lab.ntt.co.jp Non-negative Matrix Factorization, NMF [5] NMF 2 2 2 c 1959 Information Processing Society of Japan 1

Harmonic-Temporal Factor Decomposition, HTFD [6 8]. [9] HTFD [10,11] [12,13] R, C, j := 1 2. 2.1 [3] n = 1, 2,..., N 1 n nθ (u) R a,n (u) C f (u) = N a,n (u)e j(nθ (u)+φ,n ) u R φ,n R f (u) ψ α,t (u) 1 ( u t ) ψ α,t (u) = ψ 2πα α α > 0 t R ψ(u) 1 f (u) W (ln 1 α, t) = (1) (2) N a,n (u)e j(nθ (u)+φ,n ) ψ α,t(u)du (3) ψ α,t(u) t W (ln 1 α, t) t θ (u) a,n (u) θ (u) θ (t)+ θ (t)(u t), a,n (u) a,n (t) Parseval x := ln(1/α) F 0 Ω (t) = ln θ (t)w (x, t) N a,n (t)ψ (ne x+ω(t) )e j(nθ (t)+φ,n ), (4) θ (u) F 0 ψ Fourier Ψ ω = 1 [3] (ln ω)2 e 4σ 2 (ω > 0) Ψ(ω) =. (5) 0 (ω 0) σ Ψ(ω) ln ω (5) W (x, t) W (x, t) = N a,n (t)e (x Ω (t) ln n)2 4σ 2 e j(nθ (t)+φ,n ). (6) W (x, t) 2 W (x, t) 2 N a,n (t) 2 e (x Ω (t) ln n)2 2σ 2 (7) HTC [3] t t x t m (m = 0, 1,..., M 1) x l (l = 0, 1,..., L 1) Y l,m := Y(x l, t m ) Ω,m := Ω (t m ) a,n,m := a,n (t m ) 2.2 (7) [14] (1) i t m (7) c 1959 Information Processing Society of Japan 2

f,m [i] f,m [i] P β,m [0] f,m [i] = P β,m [p] f,m [i p] + ϵ,m [i], (8) p=1 β,m [p] (p = 0, 1,..., P) β,m [p] ϵ,m [i] 2.1 f,m [i] F 0 e Ω,m ϵ,m [i] F 0 e Ω,m ϵ,m [i] ϵ,m [i] = N v,n,m e jneω,m iu 0, (9) u 0 > 0 v,n,m C n f,m [i] Fourier discrete-time Fourier transform, DTFT DTFT f,m [i] = B,m (z) := N v,n,m e jne Ω,m iu0 (10) B,m (e jneω,m u 0) P β,m [p]z p (11) p=0 f,m [i] (10) (1) 2.1 a,n,m = v,n,m B,m (e jneω,m u 0) 2.3 (12) NMF F 0 a,n,m v,n,m a,n,m = w,n,m U,m, v,n,m = w,n,m U,m. (13) w,n,m w,n,m U,m t m U,m,m U,m = 1 1/ B,m (e jω ) 2 β,m [p] B,m (z) m (12) w,n,m = w,n,m B (e jneω,m u 0) 2.4 (14) C,l,m C,l,m =H,l,m U,m, (15) N H,l,m = w 2 (x l Ω,m ln n)2,n,m e 2σ 2 (16) HTC NMF X l,m X l,m = K C,l,m (17) =1 1X l,m Y l,m X l,m Poisson Y l,m Pois(Y l,m ; X l,m ) = XY l,m l,m e X l,m Γ(Y l,m ) (18) X l,m I X l,m Y l,m [14] w,n,m 0 ν 2 (14) w,n,m Rayleigh ( ) ν w,n,m Rayleigh w,n,m ; B (e jneω,m u 0) w,n,m = (ν/ B (e jneω,m u 0) ) 2 e w2,n,m /(2(ν/ B (e jneω,m u0 ) ) 2). 2.5 (19) X l,m (16) H,l,m c 1959 Information Processing Society of Japan 3

F 0 2 Ω Ω q l (Ω ), q g (Ω ) q g (Ω ) = N(Ω ; µ 1 M, ξ 2 I M), (20) 1 HTFD X l,m NMF [15] H,l,m NMF H,l,m NMF [16, 17] (16) Ω,m [18] U,m HTC [2, 3] 3. 3.1 1 Products of Experts (PoE) [19] 3.2 Ω,m U,m F 0 F 0 q l (Ω ) = N(Ω ; 0 M, τ 2 D 1 ), (21) 1 1 0 0... 0 D = 1 2 1 0... 0 0 1 2 1... 0............. 0... 0 1 2 1 0... 0 0 1 1. (22) N(Ω ; µ, Σ) µ Σ M 1 M 1 M 0 M M I M M M 2 Ω p(ω ) q l (Ω ) α l q g (Ω ) α gl (23) α l, α g q g (Ω ), q g (Ω ) U,m U,m NMF U,m = R A,m R := m U,m ( R = 1) A,m := U,m /R ( m A,m = 1) R := [R 0, R 1,..., R K 1 ], A := [A,0, A,1,..., A,M 1 ] R Dir(R; γ (R) ), A Dir(A ; γ (A) ) (24) γ (R) := [γ (R) 1,..., γ(r) K ] R γ (R) R γ (A) := [γ (A),1,..., γ(a),m ] A A 4. Y p(θ Y) p(y Θ)p(Θ) w, Θ := {Ω, R, A} c 1959 Information Processing Society of Japan 4

J(Θ) := ln p(y Θ) + ln p(θ) (25) w := {w,n,m },n,m W ln p(y Θ) = ln Pois(Y l,m ; X l,m ) W l,m ( ) ν Rayleigh w,n,m ; dw B (e jneω,m u 0),n,m (26) ln p(θ) = ln p(ω ) + ln p(r) + ln p(a ). (27) (26) w J(Θ) Θ J(Θ) (26) Jensen ln p(y Θ) q(w) Y l,m ln X l,m X l,m + W Y l,m l,m l,m l,m + ln Rayleigh(w,n,m ; ν/b (e jneω,m u 0 ) ),n,m Y l,m ln q(w)) dw (28) q(w) q(w)dw = 1, q(w) 0. W q(w) w E q(w)[w 2 ] := W q(w)w2 dw (17) X l,m, n (28) 1 (28) 1 Jensen Y l,m ln X l,m Y l,m,n (x l Ω,m ln n) 2 λ,n,l,m ln w2,n,m e 2σ 2 U,m, λ,n,l,m (29) λ := {λ,n,l,m },n,l,m λ,n,l,m 0,,n λ,n,l,m = 1 (x l Ω,m ln n) 2 λ,n,l,m = w2,n,m e 2σ 2 U,m (30) X l,m J(Θ) J + (λ, q(w), Θ) J + (λ, q(w), Θ) (x l Ω,m ln n) 2 = E q(w) Y l,m λ,n,l,m ln w2,n,m e 2σ 2 U,m c λ l,m,n,n,l,m X l,m + ln l,m p(w β, Ω) q(w) + ln p(θ). (31) = c (31) 2 x x < x 0 x L 1 < x X l,m 1 X(x, t m )dx x l = 2πσ R A,m w 2,n,m. (32) x (31) 2 Ω,m J + (λ, q(w), Θ) J ++ (λ, q(w), Θ) Θ J ++ (λ, q(w), Θ) 0 [7] 5. 5.1 F 0 HTFD F 0 RWC [20] D 4 F4 A 4 16 Hz w,m,n HTFD [6] 14.6 ms 55 Hz 7040 Hz 10 cent [21] (5) σ = 0.02 N = 8 K = 73 µ A1 55 Hz A 7 γ (A) = (1 3.96 10 6 )1 I τ = 0.83 v = 1.25 α g = α s = 1 γ (R) = (1 2.4 10 3 )1 K 2 F 0 NMF 3 I F 0 2 (a) HTFD n c 1959 Information Processing Society of Japan 5

(a) HTFD F 0 (b) NMF 2 HTFD [6] NMF [8] D 4 F4 A 4 A Pitch D Pitch A D (a) 4 F 0 HTFD Harmonic NMF HTFD HTFD [6] HTFD HTFD+ [7] SNR (b) 3 (a) (b) HTFD A3 A 4 [8] D4 2 (b) F 0 5.2 γ (R) 1 2.4 10 3 γ (R) 1 3.0 10 3 3 (a) D4 D4 γ (R) c 1959 Information Processing Society of Japan 6

5.3 HTFD [6] F 0 HTFD NMF [16, 17] Harmonic NMF C,l,m / C,l,m w q(w) w 2 E q(w)[w 2 ] RWC [20] RM-C001 RM-C005 30 MIDI FluidSynth [22] 16 Hz 14.6 ms N = 20 τ = 1.0 γ (A) = (1 1.0 10 4 )1 I γ (R) = 0.8 1 K P = 20, ν = 1 5.1 Harmonic NMF 100 HTFD HTFD+ 20 signal-to-noise ratio SNR 4 Harmonic NMF HTFD SNR 0.02 db MIDI F 0 HTFD+ Harmonic NMF HTFD SNR 0.80 db 0.78 db 6. NMF HTC HTFD PoE JSPS 26730100 [1] Hu, G. and Wang, D. L.: An auditory scene analysis approach to monaural speech segregation, Topics in Acoust. Echo and Noise Contr., pp. 485 515 (2006). [2] Kameoa, H., Nishimoto, T. and Sagayama, S.: A Multipitch Analyzer Based on Harmonic Temporal Structured Clustering, IEEE Trans. Acoust., Speech, and Language Process., Vol. 15, No. 3, pp. 982 994 (2007). [3] Kameoa, H.: Statistical Approach to Multipitch Analysis, PhD Thesis, The University of Toyo (2007). [4] Bregman, A. S.: Auditory scene analysis: The perceptual organization of sound, MIT press (1994). [5] Smaragdis, P. and Brown, J. C.: Non-negative matrix factorization for polyphonic music transcription, Proc. IEEE Worshop Applications Signal Process. Audio Acoust., IEEE, pp. 177 180 (2003). [6] No. 39 (2014). [7] No. 26 (2015). [8] Naamura, T., Shiata, K., Taamune, N. and Kameoa, H.: Harmonic-Temporal Factor Decomposition Incorporating Music Prior Information for Informed Monaural Source Separation, Proc. Int. Symposium Music Info. Retrieval, pp. 623 628 (2014). [9] [Online: 18, Apr. 2015], http://www.music-ir.org/ mirex/wii/mirex_home. [10] Smaragdis, P. and Mysore, G. J.: Separation by humming : User-guided sound extraction from monophonic mixtures, Proc. IEEE Worshop Applications Signal Process. Audio Acoust., IEEE, pp. 69 72 (2009). [11] Ozerov, A., Févotte, C., Blouet, R. and Durrieu, J. L.: Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation, Proc. Int. Conf. Acoust. Speech Signal Process., IEEE, pp. 257 260 (2011). [12] Hennequin, R., David, B. and Badeau, R.: Score informed audio source separation using a parametric model of nonnegative spectrogram, Proc. Int. Conf. Acoust. Speech Signal Process., pp. 45 48 (2011). [13] Simseli, U. and Cemgil, A. T.: Score guided musical source separation using generalized coupled tensor factorization, Proc. Eur. Signal Process. Conf., IEEE, pp. 2639 2643 (2012). [14] F 0 Vol. SP2010-74, pp. 29 34 (2010). [15] Naano, M., Le Roux, J., Kameoa, H., Ono, N. and Sagayama, S.: Infinite-state spectrum model for music signal analysis, Proc. Int. Conf. Acoust. Speech Signal Process., pp. 1972 1975 (2011). [16] Raczyńsi, S. A., Ono, N. and Sagayama, S.: Multipitch analysis with harmonic nonnegative matrix approximation, Proc. Int. Conf. Music Info. Retrieval, pp. 381 386 (2007). [17] Vincent, E., Bertin, N. and Badeau, R.: Harmonic and inharmonic Nonnegative Matrix Factorization for Polyphonic Pitch transcription, Proc. Int. Conf. Acoust. Speech Signal c 1959 Information Processing Society of Japan 7

Process., pp. 109 112 (2008). [18] Yoshii, K. and Goto, M.: Infinite Latent Harmonic Allocation: A Nonparametric Bayesian Approach to Multipitch Analysis, Proc. Int. Soc. Music Info. Retrieval, pp. 309 314 (2010). [19] Hinton, G. E.: Training products of experts by minimizing contrastive divergence, Neural Comput., Vol. 14, No. 8, pp. 1771 1800 (2002). [20] Goto, M.: Development of the RWC Music Database, Proc. Int. Congress Acoust., pp. l 553 556 (2004). [21]. 2008-281898, (20. Nov. 2008). [22] [Online: 21, Apr. 2015], http://www.fluidsynth.org/. c 1959 Information Processing Society of Japan 8