1,a) 2 MOS Acoustic Signal Adjustment by Considering Musical Expressive Intention Using a Performance Intension Function Yuma Koizumi 1,a) Katunobu Itou 2 Abstract: We propose an estimation method for a performance intention function, which is a function of fluctuation of a musician s intent shaped as a smooth curve, from the acoustic signal. In addition, we propose a method to remediate acoustic signals based on a musician s intent by using intention information obtained by performance intention functions. In this paper, discussion of estimation of the function is focused on onset time. The function is modeled as a polynomial regression with respect to time. Onset times are detected by a new onset detection method that considers aural characteristics, and the function is estimated by ridge regression by using onset times and score information. Power spectrogram of observed musical signal is remediated by using the function with respect to each note and musical images, and intentions of the player are clarified. In addition, in terms of closeness of rhythm, MOS of remediated sounds by using performance intention function were higher than the original recorded sound, and significant differences in all instruments were observed. Keywords: Musical expressive intentions, Onset detection, Regression analysis, Sound remediation 1. 1 Graduate School of Computer and Information Sciences, Hosei University 2 Faculty of Computer and Information Sciences, Hosei University a) 12t0005@cis.k.hosei.ac.jp e.g., 1
*1 *2 [1 4] [5] [6] [7] MIDI 2. 2.1 n y n = (y[1], y[2],..., y[n]) T y n = t n + x n + e n (1) *1 *2 t n x n e n x n f(t) t n x n = f (t n ) (2) f(t) 2.2 i y[i + 1] y[i] α α[i] = (t[i + 1] + x[i + 1]) (t[i] + x[i]) y[i + 1] y[i] 3. (3) 3.1 y n t n 3.2 3.1 Onset Detection [8] [9] [10] 3.1.1 2
[k T/2, k + T/2] KLD d k ( ( d k = D k T 2 ),, D ( k + T 2 )) T (6) 1 Fig. 1 y [db] Procedure of generating sub-band mel-frequency normalized spectrum. (The y-axis of all figures are logarithmic scale [db].) τ Kullback-Leibler Divergence(KLD) KLD S(k, ω) 1-I k ω (STFT) 1ms STFT 10ms S(k, ω) J = 7 B j (ω) 1-IIKLD k j 1-III S j (k, ω) = S(k, ω)b j(k, ω) ω S(k, ω)b j(ω) (4) k D(k) KLD D(k) = J ( ) Sj (k τ, ω) S j (k τ, ω) log S j (k, ω) j=1 ω (5) STFT τ = 10(i.e., 10ms) D(k) Y k σ d T = 100(i.e., 100ms) δ(k) [10] KLD Y δ(k) = λ(σ d + Median(d k )) + Median(D) 2 (7) σ d d k λ Y n λ = 0.9 1 3.1.2 Y y n y n t n KLD y[i] = argmin i y[i] t[i] D (k i ) (8) k i y[i] y n t n t n t n t n = ξh n + ψ1 n (9) h n 4 1 ξ 4 ψ 0 1 1 n 1 n (9) ξ ψ BPM t n y n G(y n, t n ) = n (y[i] t[i]) 2 (10) i=1 (8) (10) 1: ξ ψ 3
ξ = L s f s h[n], ψ = 0 (11) f s L s 2: ξ ψ (9) t n (8) y n 3: (10) ξψ 2 ξ = yt h n ψ1 T n h n h T n h n (12) ψ = yt 1 n ξ1 T n h n n (13) y n t n 3.2 1 t[1]... t[1] M w 0 1 t[2]... t[2] M w 1 f(t n ) =......... = Bw 1 t[n]... t[n] M w M 1 (14) y n = t n + Bw + e n (15) z n = y n t n 15 z n = Bw + e n. (16) (14) w (16) w w w 1 L 2 [11] R γ (w) = (z n Bw) T (z n Bw) + γw T 1 w 1 (17) M = 10 γ = 0.1 w 2 2 Fig. 2 KLD z n A estimation result of a performance intention function. Top figure shows musical score, second figure shows spectrogram of performance sound (linear frequency), third figure shows KLD (blue line) and candidate set of onset times (red circle), and bottom figure shows deviances z n (blue circle) and an estimated performance intention function (red line). 4. 4.1 α (3) [12] Griffin [13] α[i] α[i] 4.2 3 4
1 Table 1 Recording conditions. 3 Roland, EDIROL R-09 Roland, UA-25EX 48kHz, 16bit 2 Table 2 Used phrases in this evaluations. 3 RMSE Fig. 3 RMSE of onset detection. A. Dvorak, Symphony No. 8 1. 1 244-250 1st Violin R.Wagner, Tannhauser Act.II Grand March 2. 40-44 1st Violin 3. 64-68 1st Violin A. Dvorak, Symphony No. 8 1. 1 1-6 2. 1 165-169 3. 4 26-33 1. LUNKHEAD ENTRANCE 5-12 2. MONKEY MAJIK 52-56 3. Thousand Dreams 2-9 30 2 3 IC 48kHz bit 16bit 1 2 4.2.1 RMSE KLDKLD PHA[8]COM[9] 3.1.2 3 RMSE KLD RMSE KLD RMSE [10] 50ms [10] 4 MAE Fig. 4 MAE of values of performance intention functions. Pitched-Percussive [10] 3.1.2 3 4.2.2 (MAE) MAE = 1 6 3 2 1 n s n s s=1 q=1 i=1 x p s[i] x a s,q[i] (18) s q n s s x p s[i] i x a s,q[i] q i 4 MAE MAE 15ms 5
Fig. 5 5 Result of the subjective evaluation. 70ms[14] 4.2.3 5 5 ORGPRO mean opinion score (MOS) 1 5 MOS 5 Dunnett [15] 5% 5. 15ms MOS [1] Hirokazu Kameoka, et.al., Complex NMF: A New Sparse Representation for Acoustic Signals, In Proc. ICASSP 2009, pp. 3437-3440, Apr. 2009. [2] Katsutoshi Itoyama, et.al., Integration and Adaptation of Harmonic and Inharmonic Models for Separating Polyphonic Musical Signals, in Proc. ICASSP 2007, pp. 57-60, April 2007. [3],, GMM NMF,, Vol. 52, pp. 3839-3852, 2011. [4],,,, Vol. 50, pp. 1054-1066, 2009. [5] Yasunori Ohishi, et.al., A Stochastic Model of Singing Voice F0 Contours for Characterizing Expressive Dynamic Components, In Proc. INTERSPEECH 2012, Sep. 2012. [6] Yuma Koizumi, et.al., Performance expression synthesis for bowed-string instruments using Expression Mark Functions Proceedings of Meetings on Acoustics (POMA). Vol. 15, pp. 035003, Nov. 2012. [7],, 1 66(5), pp. 203-212, 2010. [8] Bello, J.P., et.al., Phase-based note onset detection for music signals, In Proc. ICASSP 2003, vol.5, 6-10 Apr. 2003. [9] Bello, J.P., et.al., On the use of phase and energy for musical onset detection in the complex domain, Signal Processing Letters, IEEE, vol.11, no.6, pp. 553-556, June 2004 [10] Bello, J.P., et.al., A Tutorial on Onset Detection in Music Signals, Speech and Audio Processing, IEEE Transactions on, vol.13, no.5, pp.1035-1047, Sept. 2005 [11] Arthur E. Hoerl, et.al., Ridge Regression: Biased Estimation for Nonorthogonal Problems, Technometrics, Vol.12, No.1., pp.55-67, Feb., 1970 [12],,,, vol.39, no.6, pp.447-452, Oct. 2009. [13] D.W.Griffin, et.al., Signal estimation from modified short-time Fourier transform In Proc. ICASSP 1984, vol.32, no.2, pp.236-243, Apr. 1984. [14] Simon Dixon, Automatic extraction of tempo and beat from expressive performances, Journal of New Music Research, vol. 30, pp. 39-58, 2001. [15] C.W.Dunnett, New Tables for Multiple Comparisons with a Control, Biometrics, Vol.20, No.3, pp.482-491, 1964. 6