1,a) 1,b) 1,c) 1. MIDI [1], [2] U/D/S 3 [3], [4] 1 [5] Song [6] 1 Sony, Minato, Tokyo 108 0075, Japan a) Emiru.Tsunoo@jp.sony.com b) AkiraB.Inoue@jp.sony.com c) Masayuki.Nishiguchi@jp.sony.com MIDI [7] Li HMM Viterbi [8] [9] Hsu [10] Salamon [11] 2 3 4 5 c 2013 Information Processing Society of Japan 1
1 2. 2.1 3 1 [9] 2 2 3 2 3 1 2 3 1 1 2 3 2.2 2.2.1 1 2-(a) 2-(a) 2-(b) 0 1 x y Y (x, y) 1 x X 1 y Y {a 0, a 1,..., a K 1 } l(x, y) = K 1 k=0 a k log Y (x, y) (1) v(x, y) = 1 (µ(x)<u Y (x,y)<l(x,y)) 0 (U Y (x,y) µ(x)) l(x, y) µ(x) U Y (x, y) µ(x) (otherwise) (2) µ(x) x log Y (x, y) U Y (x, y) y p (y) p + (y) P (x, y) = log Y (x, p (y)) P + (x, y) = log Y (x, p + (y)) U Y (x, y) = (p +(y) y)p (x, y) + (y p (y))p + (x, y) p + (y) p (y) (3) p Y (x, p) = max Y (x, j) (4) p H j p+h c 2013 Information Processing Society of Japan 2
2 : (a) (b) (a) (c) U(x, y) (d) v(x, y) (e) S(x, y) H U Y (x, y) 2-(c) v(x, y) 2-(d) v(x, y) 1 2.2.2 2 Barry 2 [12] [7], [11] Barry A i,c (x, y) N v(x, ny) Y L (x, ny) βiy R (x, ny) A i,0 (x, y) = A i,1 (x, y) = n=1 N α (5) N v(x, ny) Y R (x, ny) βiy L (x, ny) n=1 N α (6) N α Y L (x, y) Y R (x, y) 3 The Beatles Hey Jude S(x, y) : (a) S(x, y) (b) T (x) (c) S(x, y) (d) β i 0 β i 1 S(x, y) = max A i,c(x, y) min A i,c(x, y) (7) i,c i,c 2 S(x, y) 2-(e) S(x, y) The Beatles Hey Jude S(x, y) 3-(a) Y R (x, y) 0 S (x, y) = 1 N α N v(x, ny) Y L (x, ny) (8) n=1 c 2013 Information Processing Society of Japan 3
4 2.3 S(x, y) 3 (4) S(x, y) 4 S(x, y) T 1 F 1 F 2 2 1/12 F 2/F 1 2 1/12 5 T 2 3-(a) 3-(c) 2.4 3 Viterbi [8], [10] [8] T (x) S(x, y) 5 c t 1 c t 4 [8] 3-(a) 3-(b) U m = {(x m,1, y m,1 ), (x m,2, y m,2 ),...} x m,1 = x m,2 1 = x m,3 2... m U m m M U m x m,first = x m,1 x m,last x m,last < x n,first, (m < n, m, n M) D M D M = ( m M (x,y) U m S(x, y) γ 1 (x,y) U m log y log T (x) γ 2 log y m,last log y m,first ) (9) m m γ 1 γ 2 (9) 3 5 4 c t 1 c t 4 M c t 1 c t 2 2 D L Y X d L (x, y) D E Y 1 d E (y) d E (0) = 0 (10) d L (0, y) = S(0, y) + γ 1 log y log T (0) (11) if (x.y) U m and x = x m,first then { de (j) + S(x, y) γ 1 log y log T (x) d L (x, y) max j γ 2 log j log y } c 2013 Information Processing Society of Japan 4
(12) if (x, y) U m and x = x m,last then { de (y) d E (y) max d L (x 1, y) + S(x, y) γ1 log y log T (x) 1 ADC2004 1/2 Pop Daisy Opera Jazz MIDI Proposed 0.726 0.942 0.549 0.741 0.556 MIDI 0.692 0.907 0.647 0.862 0.661 (13) if (x, y) U m and x m,first < x < x m,last then d L (x m,n, y m,n ) d L(x m,n 1, y m,n 1 ) + S(x m,n, y m,n ) γ 1 log(y m,n ) log T (x m,n ) D M D Mmax (14) = max d E (j), (15) j D Mmax 3-(c) 3-(d) 3. Zhu [2] [2] 10 4. 2 2048 512 8000Hz (4) H 16 (9) γ 1 γ 2 0.52 5.2 (5) α 0.5 a k 0.15π 2 ADC2004 1 Pop Daisy Opera Jazz MIDI Proposed 0.791 0.971 0.695 0.767 0.564 MIDI 0.728 0.911 0.673 0.874 0.675 4.1 1 MIREX *1 MIREX Raw Pitch Accuracy 1/2 50 MIREX ADC2004 5 20 Pop Daisy Opera Jazz, MIDI 5 MIREX MELODIA[11] 1 Jazz MIDI Pop Daisy Opera 1/2 1 2 Opera *1 Music Information Retrieval Evaluation exchange (MIREX) [Online]. http:www.music-ir.org/mirex/wiki/audio Melody Extraction c 2013 Information Processing Society of Japan 5
3 MIDI 1000 Reference Melody Top 1 Top 5 Top 10 MIDI 100 % 100 % 100 % MELODIA 48.78 % 51.22 % 51.22 % Proposed (Monaural) 64.63 % 75.61 % 78.05 % Proposed (Stereo) 80.49 % 85.37 % 89.02 % 4 14 1000 Reference Melody Top 1 Top 5 Top 10 MIDI 78.99 % 86.23 % 89.86 % MELODIA 47.83 % 50.72 % 53.62 % Proposed (Monaural) 65.85 % 73.17 % 76.83 % Proposed (Stereo) 71.74 % 80.43 % 84.06 % 4.2 MIDI MELODIA MIDI 1000 MIDI 82 15 3 Top 1 Top 5 5 Top 10 10 14 1000 15 138 MELODIA MIDI 4 MIDI 5. 3 [1] S. Pauws, CubyHum: a fully operational query by humming system, in Proc. ISMIR, pp. 187 196, 2002. [2] Y. Zhu and D. Shasha, Warping indexes with envelope transform for query by humming, in ACM SIGMOD Int. Conf., pp. 181 192, 2003. [3] A. Ghias, J. Logan, D. Chamberlin, and B. Smith, Query by humming: musical information retrieval in an audio database, in Proc. ACM Multimedia, pp. 231 236, 1995. [4] L. Lu, H. You, and H.-J. Zhang, A new approach to query by humming in music retrieval, in Proc. ICME, pp.22-25, 2001. [5] T. Nishimura, H. Hashiguchi, J. Takita, J. X. Zhang, M. Goto and R. Oka, Music signal spotting retrieval by a humming query using start frame feature dependent continuous dynamic programming, in Proc. ISMIR, pp. 211-218, 2001. [6] J. Song, S. Y. Bae, and K. Yoon, Mid-level music melody representation of polyphonic audio for query-byhumming system, in Proc. ISMIR, pp. 133 139, 2002. [7] M. Goto, A real-time music-scene-description system: predominant-f0 estimation for detecting melody and bass line in real-world audio signals, in Speech Communication, vol. 43, no. 4, pp. 311 329, 2004. [8] Y. Li, D. L. Wang, Separation of singing voice from music accompaniment for monaural recordings, in Trans. Audio, Speech and Language Processing, vol. 15, no. 4, pp. 1475 1487, 2007. [9] H. Tachibana, T. Ono, N. Ono, and S. Sagayama, Melody line estimation in homophonic music audio signals based on temporal-variablity of melody source, in Proc. ICASSP, pp. 425 428, 2010. [10] C. L. Hsu, D. L. Wang, and J.-S. R. Jang, A trend estimation algorithm for singing pitch detection in musical recordings, in Proc. ICASSP, pp. 393 396, 2011. [11] J. Salamon and E. Goméz, Melody extraction from polyphonic music signals using pitch contour characteristic, in Trans. Audio, Speech and Language Processing, vol. 20, no. 6, pp. 1759 1770, 2012. [12] D. Barry, B. Lawlor, and E. Coyle, Sound source separation: Azimuth discrimination and resynthesis, in Proc. Int. Conf. Digital Audio Effects, 2004. [13] R. B. Dannenberg and N. Hu, Understanding search performance in query-by-humming systems, in Proc. ISMIR, pp.232 237, 2004. c 2013 Information Processing Society of Japan 6