Vol.009-MUS-81 No.13 009/7/30 1 1 1 1 10 81.6% Parameter Estimation of Mixture Model of Multiple Instruments and Application to Musical Instrument Identification KATSUTOSHI ITOYAMA, 1 MASATAKA GOTO, KAZUNORI KOMATANI, 1 TETSUYA OGATA 1 and HIROSHI G. OKUNO 1 This report presents parameter estimation of mixture model of multiple instruments based on the integrated harmonic and inharmonic model, and its application to musical instrument identification. Parameters of the integrated model, which fit an observed power spectrogram, are estimated by an bayesian inference method based on calculus of variations. Since parameter distributions of the integrated model depend on each instrument, the instrument name is identified by selecting an instrument which has maximum relative instrument weight. Experimental results showed 81.6% of accuracy for 10 instruments. 1 Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University National Institute of Advanced Industrial Science and Technology (AIST) 1. 1) 3) 4),5) 6),7) 8),9) 10),11) 7),1) 13). X(t, f) t f X(t, f) 1 c 009 Information Processing Society of Japan
Table 1 1 Constraints for the parameters of the integrated tone model and their acoustical meanings. α(l) α(l) =1 l l β(h),β(i) β(h) +β(i) =1 γ(m, n) γ(m, n) =1 m,n δ(m, n) δ(m, n) =1 m,n τ ρ H ρ H > 0 mρ H ρ I ρ I > 0 mρ I ω ω>0 σ σ>0 κ, λ κ>0, λ>1.1 J(t, f) H(t, f) I(t, f) J(t, f) =β(h)h(t, f)+β(i)i(t, f) M H 1 N H M I 1 N I H(t, f) = γ(m, n)h(t, f; m, n) I(t, f) = δ(m, n)i(t, f; m, n) H(t, f; m, n) =N (t; T H(m),ρ H)N (f; F H(n),σ H) I(t, f; m, n) =N (t; T I(m),ρ I )M(f; n) N (x; μ, σ )= 1 ( ) (x μ) exp πσ σ ( ) 1 (FI(f) μ) M(f; μ) = exp π(f + κ)logλ ( f ) T H(m) =τ + mρ H T I(m) =τ + mρ I F H(n) =nω F I(f) =log λ κ +1 1.1.1 EM { z = {z β (H),z β (I)}, {z γ(m, n)},...,m H 1, {z δ (m, n)},...,m I 1,...,N H,...,N I 1-of-K 1 1 0 K (t, f) p(t, f, z θ) =p(z β (H) =1 θ) p(t, f, {z γ(m, n)} z β (H) =1,θ) p(z β (H) =1 θ) =β(h) + p(z β (I) =1 θ) p(t, f, {z δ (m, n)} z β (I) =1,θ) p(t, f, {z γ(m, n)} z β (H) =1,θ)= M H 1 N H p(z β (I) =1 θ) =β(i) p(z γ(m, n) =1 z β (H) =1,θ) p(t, f z β (H) =1,z γ(m, n) =1,θ) p(t, f, {z δ (m, n)} z β (I) =1,θ)= M I 1 N I p(z δ (m, n) =1 z β (I) =1,θ) p(t, f z β (I) =1,z δ (m, n) =1,θ) p(z γ(m, n) =1 z β (H) =1,θ)=γ(m, n) p(z δ (m, n) =1 z β (I) =1,θ)=δ(m, n) p(t, f z β (H) =1,z γ(m, n) =1,θ)=H(t, f; m, n) p(t, f z β (I) =1,z δ (m, n) =1,θ)=I(t, f; m, n) θ. Vol.009-MUS-81 No.13 009/7/30 1 } c 009 Information Processing Society of Japan
τ ω p(β(h),β(i)) = D ( β(h),β(i); φ β (H), φ β (I) ) p({γ(m, n)}) =D ( {γ(m, n)}; { φ γ(m, n)} ) p({δ(m, n)}) =D ( {δ(m, n)}; { φ δ (m, n)} ) ( ( ξh p(τ) =N τ; ν τ, + ξ ) 1 ) ( ( ) 1) I ξω p(ω,σ) =N ω; ν ω, G ( σ ; η, ρ H ρ I σ ζ ) D( ) G( ) D(x 1,...,x N; φ 1,...,φ N) N x n φ n 1 G(x; η, ζ) x η 1 e ζx ρ H, ρ I, κ λ.3 X(t, f) X(t, f) X(t, f)p(t, f, z, θ) dt df p(θ X) = p(z,θ X) = z p(x, z, θ) dθ z z q(z, θ) q(z, θ) =q(z) q(θ) q(z) =q(z β (H),z β (I)) q({z γ(m, n)}) q({z δ (m, n)}) q(θ) =q(β(h),β(i)) q({γ(m, n)}) q({δ(m, n)}) q(τ) q(ω,σ) p(t, f, z, θ) F[q] = X(t, f) q(z, θ)log dt df dθ q(z, θ) z Vol.009-MUS-81 No.13 009/7/30 q q(θ) q(z) q({z γ(m, n)} z β (H) =1)=C γ exp log p(t, f, {z γ(m, n)} z β (H) =1,θ) X(t,f),q(θ) q({z δ (m, n)} z β (I) =1)=C δ exp log p(t, f, {z δ (m, n)} z β (I) =1,θ) X(t,f),q(θ) q(z β (H),z β (I)) = C β exp log p(t, f, z β (H),z β (I) θ) X(t,f),q(θ) C β, C γ, C δ q(z) =1 x z f(x) x f(x) q(β(h),β(i)) = C βp(β(h),β(i)) exp log p(t, f, z θ) X(t,f),q(z),q(θ/{β(H),β(I)}) = D ( β(h),β(i); ˆφ β (H), ˆφ β (I) ) q({γ(m, n)}) =C γp({γ(m, n)})exp log p(t, f, z θ) X(t,f),q(z),q(θ/{γ(m,n)}) = D ( {γ(m, n)}; { ˆφ γ(m, n)} ) q({δ(m, n)}) =C δp({δ(m, n)})exp log p(t, f, z θ) X(t,f),q(z),q(θ/{δ(m,n)}) = D ( {δ(m, n)}; { ˆφ δ (m, n)} ) ( ( ˆξH q(τ) =C τp(τ)exp log p(t, f, z θ) X(t,f),q(z),q(θ/{τ}) = N τ;ˆν τ, + ˆξ ) 1 ) I ρ H ρ I q(ω, σ) =C ω,σp(ω,σ)exp log p(t, f, z θ) X(t,f),q(z),q(θ/{ω,σ}) ( ( ) 1) ˆξω = N ω;ˆν ω, G ( σ ;ˆη, σ ˆζ ) C β, C γ, C δ, C τ, C ω,σ q(θ) dθ =1 3 c 009 Information Processing Society of Japan
3. L L J (t, f) l L J (t, f) = α(l) J(l, t, f) l=1 α(l) l {α(l)} p({α(l)}) p({α(l)}) =D ( {α(l)}; { φ α(l)} ) q(z, θ) =q(z) q(θ) L q(z) =q({z α(l)}) q(z β (l, H),z β )(l, I)) q({z γ(m, n)}) q({z δ (m, n)}) q(θ) =q({α(l)}) l=1 L q(β(l, H),β(l, I)) q({γ(l, m, n)}) q({δ(l, m, n)}) q(τ(l)) q(ω(l),σ(l)) l=1 F[q] q({z α(l)}) p({z α(l)} X) z α(l) argmax l q(z α(l) =1) l X(t, f) Table Instrument sounds in this experiment. Instrumet name (Abbreviation) Number of tones Piano (PF) 79 Acoustic Guitar (AG) 70 Electric Guitar (EG) 70 Violin (VN) 576 Cello (VC) 565 Trumpet, Cornet (TR) 30 Trombone (TB) 85 Alto Sax (AS) 97 Tenor Sax (TS) 94 Clarinet (CL) 360 4. Vol.009-MUS-81 No.13 009/7/30 3 Table 3 Constants of the integrated model. M H 30 N H 100 N H 30 N H 100 ρ H 0.05 sec. ρ I 0.05 sec. κ 440.0 Hz λ 1.134 RWC 14) 10 3 1 3 4.1 4 5 15) F0 PF, EG, VN, VC, CL AG, TR, TB, AS, TS 50 70% VN, VC, CL 1 4 c 009 Information Processing Society of Japan
4 Table 4 Accuracy for each instrument. Inst. Baseline 15) Proposed PF 74. % 86.7 % AG 81. % 66.1 % EG N/A 99.4 % VN 69.7 % 96.0 % VC 73.5 % 97.7 % TR 73.5 % 51.7 % TB 76.7 % 54.7 % AS 41.5 % 61.6 % TS 64.7 % 63.6 % CL 90.7 % 94.7 % Avg. 73.1 % 81.6 % 1 Fig. 1 5 Table 5 Confusion matrix. True Classified as inst. PF AG EG VN VC TR TB AS TS CL Total PF 687 4 0 81 0 0 0 0 0 0 79 AG 464 4 141 11 0 0 0 0 80 70 EG 0 0 698 0 3 0 0 0 0 1 70 VN 0 0 553 10 0 0 0 0 11 576 VC 3 0 0 10 55 0 0 0 0 0 565 TR 1 0 79 7 156 0 0 0 57 30 TB 3 1 1 64 53 0 156 0 0 7 85 AS 0 0 0 95 16 0 0 183 0 3 97 TS 4 1 0 69 13 0 0 0 187 0 94 CL 1 0 0 15 3 0 0 0 0 341 360 Total 703 470 705 1107 688 156 156 183 187 50 4875 Variance of relative weight parameters of inharmonic model along frequency axis. 5. CrestMuse COE Vol.009-MUS-81 No.13 009/7/30 1) (1994). ) Goto, M.: A Real-time Music-scene-analysis System: Predominant-F0 Estimation for Detecting Melody and Bass Lines in Real-world Audio Signals, Speech Communication, Vol.43, No.4, pp.311 39 (004). 3) Klapuri, A.: Multipitch Analysis of Polyphonic Music and Speech Signals Using an Auditory Model, IEEE Trans. Audio, Speech and Lang. Process., Vol. 16, No., pp. 55 66 (008). 4) Goto, M.: An Audio-based Real-time Beat Tracking System for Music with or without Drum-sounds, J. New Music Res., Vol.30, No., pp.159 171 (001). 5) Dixon, S.: Automatic Extraction of Tempo and Beat from Expressive Performances, J. New Music Res., Vol.30, No.1, pp.39 58 (001). 6) Kitahara, T.: Computational Musical Instrument Recognition and Its Application to Contentbased Music Information Retrieval, PhD Thesis, Kyoto University (007). 7) Yoshii, K., Goto, M. and Okuno, H.G.: Drum Sound Recognition for Polyphonic Audio Signals by Adaptation and Matching of Spectrogram Templates with Harmonic Structure Suppression, IEEE Trans. Audio, Speech and Lang. Process., Vol.15, No.1, pp.333 345 (007). 8) Kameoka, H., Nishimoto, T. and Sagayama, S.: A Multipitch Analyzer Based on Harmonic Temporal Structured Clustering, IEEE Trans. Audio, Speech and Lang. Process., Vol. 15, No.3, pp.98 994 (007). 9) Vol.007, No.71, pp.155 160 (007). 10) Woodruff, J., Pardo, B. and Dannenberg, R.: Remixing Stereo Music with Score-informed Source Separation, Proc. ISMIR, pp.314 319 (006). 11) Viste, H. and Evangelista, G.: A Method for Separation of Overlapping Partials Based on Similarity of Temporal Envelopes in Multichannel Mixtures, IEEE Trans. Audio, Speech and Lang. Process., Vol.14, No.3, pp.1051 1061 (006). 1) Casey, M. and Westner, A.: Separation of Mixed Audio Sources by Independent Subspace Analysis, Proc. ICMC, pp.154 161 (000). 13) Itoyama, K., Goto, M., Komatani, K., Ogata, T. and Okuno, H. G.: Parameter Estimation for Harmonic and Inharmonic Models by Using Timbre Feature Distributions, IPSJ Journal, 5 c 009 Information Processing Society of Japan
Vol.50, No.7 (009). 14) Goto, M., Hashiguchi, H., Nishimura, T. and Oka, R.: RWC Music Database: Music Genre Database and Musical Instrument Sound Database, Proc. ISMIR, pp.9 30 (003). 15) F0 Vol.44, No.10, pp. 448 458 (003). s(l, H, m, n, t, f) =Ψ ( ˆφα(l) ) +Ψ ( ˆφβ (l, H) ) +Ψ ( ˆφγ(l, m, n) ) Ψ ( ˆφβ (l, H)+ ˆφ β (l, I) ) Ψ ˆξ τ(l) 1 +(ˆν τ(l) t + mρ H) ρ H ( MH 1 N H ˆφ γ(l, m, n) log ˆζ(l) Ψ(ˆη(l)) s(l, I, m, n, t, f) =Ψ ( ˆφα(l) ) +Ψ ( ˆφβ (l, I) ) +Ψ (ˆφδ (l, m, n) ) Ψ ( ˆφβ (l, H)+ ˆφ β (l, I) ) Ψ ˆξ τ(l) 1 +(ˆν τ(l) t + mρ I) ρ I Ψ(x) = d log Γ(x) ˆξτ dx ( MI 1 N I (l) = ˆξ H(l) ρ H ) ) ˆφ δ (l, m, n) log(f + κ) log log λ + ˆξ I(l) ρ I n ˆξ (nˆνω(l) f) ˆη(l) ω(l) ˆζ(l) (FI(f) n) q(z α(l) =1,z β (l, H) =1,z γ(l, m, n) =1 t, f) = exp(s(l, H, m, n, t, f)) ( exp(s(l, H, m, n, t, f)) + exp(s(l, I, m, n, t, f))) l m,n m,n q(z α(l) =1,z β (l, I) =1,z δ (l, m, n) =1 t, f) = exp(s(l, I, m, n, t, f)) ( exp(s(l, H, m, n, t, f)) + exp(s(l, I, m, n, t, f))) l m,n m,n X(t, f) X(l, H, m, n, t, f) =X(t, f) q(z α(l) =1,z β (l, H) =1,z γ(l, m, n) =1 t, f) N H X(l, H, m) = X(l, H, m, n, t, f) dt df ˆφ α(l) = φ α(l)+ X(l) ˆφ β (l, H) = φ β (l, H)+ X(l, H) ˆφβ (l, I) = φ β (l, I)+ X(l, I) ˆφ γ(l, m, n) = φ γ(l, m, n)+ X(l, H, m, n) ˆφδ (l, m, n) = φ δ (l, m, n)+ X(l, I, m, n) ( ξh (l) + ξ I (l)) ντ + T H (l) + T I (l) ρ ρ H ρ I ρ H I ˆν τ (l) = ξ H (l) ρ H + ξ I (l) ρ I + X(l,H) ρ H + X(l,I) ρ I ˆξ H(l) = ξ H(l)+ X(l, H) ˆξI(l) = ξ I(l)+ X(l, I) ˆν ω(l) = ξ ω(l) ν ω(l)+ F 1(l) ξ ω(l)+ F 0(l) ˆη σ(l) = η X(l, H) σ(l)+ F (l) F 1 (l) F ˆζ σ(l) = ζ 0 (l) σ(l)+ M H 1 T H(l) = F (l) = F 0(l) = N H ˆξ ω(l) = ξ ω(l)+ F 0(l) + ξ ω(l) F 0(l) ( F1 (l) F 0 (l) νω(l)) ( ξ ω(l)+ F 0(l)) M I 1 X(l, H, m, t)(t mρ H) dt TH(l) = X(l, H, f) f df F1(l) = X(l, H, n) n N H X(l, H, n, f) nf df Vol.009-MUS-81 No.13 009/7/30 X(l, I, m, t)(t mρ I) dt 6 c 009 Information Processing Society of Japan