1,a) 1,2,b) Continuous wavelet transform, CWT CWT CWT CWT CWT 100 1. Continuous wavelet transform, CWT [1] CWT CWT CWT [2 5] CWT CWT CWT CWT CWT Irino [6] CWT CWT CWT CWT CWT 1, 7-3-1, 113-0033 2 NTT, 3-1, 243-0198 a) nakamura@hil.t.u-tokyo.ac.jp b) kameoka@hil.t.u-tokyo.ac.jp CWT Fourier short-time Fourier transform, STFT 5 STFT [7] STFT STFT STFT STFT STFT STFT STFT Griffin [8] CWT Griffin STFT [8] Irino CWT [9] CWT CWT CWT CWT c 1959 Information Processing Society of Japan 1
CWT CWT 1 CWT CWT CWT CWT [10] CWT CWT 2. 2.1 CWT l [0, L 1] t [0, T 1] L s l := [s l,0, s l,1,, s l,t 1 ] C T s := [s 0, s 1,, s L 1 ] CLT s f = [ f 0, f 1,, f T 1 ] F, (F := {f C T ; t f t = 0}) CWT CWT W C LT T W := W 0 W 1.. W L 1 s = Wf (1) ψ l,0 ψ l,1 ψ l,t 1 ψ l,t 1 ψ l,0 ψ l,t 2, W l :=...... (2).. ψ l,1 ψ l,2 ψ l,0 W l C T T CWT l CWT ψ l,t := ψ(t /a l )/a l a l t ψ( ) C CWT W W + f = W + s, W + := (W H W) 1 W H (3) H W + W + s = argmin s W f 2 2 (4) f F 2 l 2 (1) CWT T LT CWT 1 CWT W LT C LT CWT CWT CWT CWT CWT 0 LT = s WW + s (5) 0 LT LT W (5) s W STFT STFT [7]. 2.2 CWT CWT CWT 1 CWT CWT CWT CWT 2.3 CWT (5) Fourier discrete Fourier transform, DFT Ŵ = Ŵ 0 Ŵ 1.. Ŵ L 1 0 = ŝ ŴŴ + ŝ, (6), Ŵ l = F T W l F 1 T, Ŵ+ = (Ŵ H Ŵ) 1 Ŵ H (7) F T C T T DFT ˆ DFT W l F T FT 1 Ŵ l l k [0, T 1] c 1959 Information Processing Society of Japan 2
0 = ŝ l,k 1 C k l ˆψ l,k ˆψ l,kŝl,k (8) C k ( l l, ˆψ l,k ˆψ l,k 0) CWT (5) CWT (5) Morlet [3] auditory wavelet transform [6] CWT STFT STFT [7] CWT CWT 3. 3.1 CWT CWT a [0, ) LT CWT CWT ϕ [ π, π) LT CWT I(ϕ) 0 ϕ I(ϕ) := s(a, ϕ) WW + s(a, ϕ) 2 2, (9) s(a, ϕ) := a e jϕ 0 e jϕ 1. e jϕ LT 1 (10) s(a, ϕ) CWT I(ϕ) s(a, ϕ) I(ϕ) 0 s(a, ϕ) 3.2 I(ϕ) ϕ (9) 2 2 CWT CWT CWT (14) (15) ϕ [11] I(ϕ) I(ϕ) s I + (ϕ, s) I(ϕ) I + (ϕ, s) ϕ s I(ϕ) I(ϕ) I(ϕ) (4) s W I + (ϕ, s) I(ϕ) = min s(a, ϕ) W f 2 2 (11) f F = min s W s(a, ϕ) s 2 2 (12) s(a, ϕ) s 2 2 =: I+ (ϕ, s) (13) (13) s = WW + s(a, ϕ) I + (ϕ, s)/ ϕ = 0 LT s WW + s(a, ϕ), (14) ϕ s (15) [ π, π) LT 3.3 2 (14) (15) s(a, ϕ) CWT CWT s(a, ϕ) s Irino [6] W STFT STFT [7] STFT CWT Lopes [9] 4. 4.1 3 c 1959 Information Processing Society of Japan 3
3 Fourier [3] (i) (ii) FFTŴ lˆf k [B, B+D 1] T D [2πB/D, 2πB/D + 2π] (ii) [0, 2π] DFT n := (B + D)/D n (ii) 4 [2πB/D, 2πn] FFT D CWT l B, D CWT 0 CWT CWT CWT CWT FFT (ii) 0 CWT 4 [10] CWT STFT CWT [10] 2.3 CWT 3 CWT Morlet [3] l { ˆψ l,k } k k [B, B + D 1] (0 B, 0 < D T) CWT k [B, B + D 1] (0 B, 0 < D T) { ˆψ l,k } k CWT l D 2 CWT CWT CWT CWT CWT f Fourier fast Fourier transform, FFT 2 (i) l 4.2 (i) (ii) K := 0 B 0 (D B 0 ) I B0 I D B0 0 (D B0 ) B 0 } {{ } (ii) [ ] 0D B I D 0 D (T D B) } {{ } (i) (16) B 0 := B (n 1)D I D D D 0 D B D B CWT CWT l š l C D K š l = F 1 D KŴ l F T f (17) CWT CWT 3 CWT CWT CWT CWT 4.3 T CWT CWT FFT FFT O(T log 2 T) CWT O(T log 2 T +LT log 2 T) CWT l D l T O(T log 2 T + L 1 l=0 D l log 2 D l ) Irino [6] CWT c 1959 Information Processing Society of Japan 4
Irino LT l D l 5. 5.1 1: 5.1.1 Irino [6] CWT ATR A [12] faf 115 mht 113 CWT FFT 2 2 1000 [3] Fourier ) (log ω)2 exp ( (ω > 0) ˆψ(ω) := 4σ 2 0 (ω 0) (18) ω σ σ 0.02 20 cent 27.5 7040 Hz ±3σ Intel Xeon CPU E31245 (3.3 GHz) 32 GB RAM perceptual evaluation of speech quality (PESQ) [13] PESQ 0.5 4.5 5.1.2 PESQ Irino 4.20 ± 0.08 4.1 ± 0.1 *1. 5 Irino 100 Irino 15 s 10 s/iteration 0.1 s/iteration 5.2 2: 5.2.1 RWC [14] 102 16 khz *1 http://hil.t.u-tokyo.ac.jp/ nakamura/ demo/fastcwt.html 5 6 7 [s/iteration] Objective difference grade 1000 100 10 1 0.1 0.01 0.001 0 2 4 6 8 10 12 14 16 18 [s] -0.5-1 -1.5-2 -2.5-3 -3.5 [Irino1993] -4 0 100 200 300 400 500 ([ Pσ, Pσ] (P = 1, 2, 3, 5)) Irino [6] Perceptual evaluation of audio quality Objective difference grade objective difference grades -0.5-1 -1.5-2 -2.5-3 -3.5 [Irino1993] -4 0 10 20 30 40 50 [s] ([ Pσ, Pσ] (P = 1, 2, 3, 5)) Irino [6] Perceptual evaluation of audio quality objective difference grades 30 s 35 s CWT σ = 0.02 (i) ±Pσ (P = 1, 2, 3, 5) 500 Irino c 1959 Information Processing Society of Japan 5
8 Perceptual evaluation of speech quality 4.5 4 3.5 3 2.5 2 1.5 0 20 40 60 80 100 [s] ([ Pσ, Pσ] (P = 1, 2, 3, 5)) Irino [6] Perceptual evaluation of speech quality 100 Intel Core i3-2120 CPU (3.30 GHz) 8GB RAM 5.1.1 perceptual evaluation of audio quality (PEAQ) [15] objective differential grade (ODG) 4 0 ODG 5.2.2 6 ODG P = 3, 5 ODG 100 2.0 *2 Irino P 3 Irino P 7 RWC-MDB-G-2001 No. 1 ODG ODG ATR [12] A fafsc110 7 s 8 P = 3 6. Irino [6] CWT [10] *2 c.f.) MPEG-3 160 kbps ODG 3.68 ± 0.03 Irino 100 JSPS 26730100 [1] [ ] Vol. 39, No. 6, pp. 413 418 (2009). [2] Schmidt, M. N. and Mørup, M.: Nonnegative matrix factor 2- D deconvolution for blind single channel source separation, Independent Component Analysis and Blind Signal Separation, Springer, pp. 700 707 (2006). [3] Kameoka, H.: Statistical Approach to Multipitch Analysis, PhD Thesis, The University of Tokyo (2007). [4] Muller, M., Ellis, D. P. W., Klapuri, A. and Richard, G.: Signal processing for music analysis, IEEE J. Sel. Topics. Signal Process., Vol. 5, No. 6, pp. 1088 1110 (2011). [5] de León, J. P., Beltrán, F. and Beltrán, J. R.: A complex wavelet based fundamental frequency estimator in singlechannel polyphonic signals, Proc. Digital Audio Effects (2013). [6] Irino, T. and Kawahara, H.: Signal reconstruction from modified auditory wavelet transform, IEEE Trans. Signal Process., Vol. 41, No. 12, pp. 3549 3554 (1993). [7] Le Roux, J., Kameoka, H., Ono, N. and Sagayama, S.: Fast Signal Reconstruction from Magnitude STFT Spectrogram Based on Spectrogram Consistency, Proc. Int. Conf. Digital Audio Effects, pp. 397 403 (2010). [8] Griffin, D. and Lim, J.: Signal estimation from modified short-time Fourier transform, IEEE Trans. Acoust., Speech, Signal Process., Vol. 32, No. 2, pp. 236 243 (1984). [9] Lopes, D. M. and White, P. R.: Signal reconstruction from the magnitude or phase of a generalised wavelet transform, Proc. Eur. Signal Process. Conf., pp. 2029 2032 (2000). [10] (2008). 2008-281898. [11] Ortega, J. M. and Rheinboldt, W. C.: Iterative solution of nonlinear equations in several variables, No. 30 (2000). [12] Kurematsu, A., Takeda, K., Sagisaka, Y., Katagiri, S., Kuwabara, H. and Shikano, K.: ATR Japanese Speech Database as a Tool of Speech Recognition and Synthesis, Speech Commun., Vol. 9, No. 4, pp. 357 363 (1990). [13] ITU-T: Recommendation P.862, Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-To- End Speech Quality Assessment of Narrow-Band Telephone Networks and Speech Codecs (2001). [14] Goto, M.: Development of the RWC Music Database, Proc. Int. Congress Acoust., pp. l 553 556 (2004). [15] ITU-T: Recommendation BS.1387-1, Perceptual Evaluation of Audio Quality (PEAQ): Method for Objective measurements of perceived audio quality (2001). c 1959 Information Processing Society of Japan 6