Fourier transform, STFT 5. Continuous wavelet transform, CWT STFT STFT STFT STFT [1] CWT CWT CWT STFT [2 5] CWT STFT STFT CWT CWT. Griffin [8] CWT CWT

Σχετικά έγγραφα
Non-negative Matrix Factorization, NMF [5] NMF. [1 3] Bregman [4] Harmonic-Temporal Clustering, HTC [2,3] 1,2,b) NTT

MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)

Vol.4-DCC-8 No.8 Vol.4-MUS-5 No.8 4// 3 3 Hanning (T ) 3 Hanning 3T (y(t)w(t)) dt =.5 T y (t)dt. () STRAIGHT F 3 TANDEM-STRAIGHT[] 3 F F 3 [] F []. :

( ) (Harmonic-Temporal Clustering; HTC) [1], [2] ( ) ( ) [4] HTC. (Non-negative Matrix Factorization; NMF) [3] [5], [6] [7], [8]

SNR F0 [2], [3], [4] F0 F0 F0 F0 F0 TUSK F0 TUSK F0 6 TUSK 6 F0 2. F0 F0 [5] [6] [7] p[8] Cepstrum [9], [10] [11] [12] [13] F0 [14] F0 [15] DIO[16] [1

Query by Phrase (QBP) (Music Information Retrieval, MIR) QBH QBP / [1, 2] [3, 4] Query-by-Humming (QBH) QBP MIDI [5, 6] [8 10] [7]

GUI

Estimation, Evaluation and Guarantee of the Reverberant Speech Recognition Performance based on Room Acoustic Parameters

VOCODER VOCODER Vocal

Spectrum Representation (5A) Young Won Lim 11/3/16

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

Ψηφιακή Επεξεργασία Φωνής

ITU-R BT.1908 (2012/01) !" # $ %& '( ) * +, - ( )

CSJ. Speaker clustering based on non-negative matrix factorization using i-vector-based speaker similarity


Parameter Estimation of Mixture Model of Multiple Instruments and Application to Musical Instrument Identification

Acoustic Signal Adjustment by Considering Musical Expressive Intention Using a Performance Intension Function

Fundamentals of Signals, Systems and Filtering

Re-Pair n. Re-Pair. Re-Pair. Re-Pair. Re-Pair. (Re-Merge) Re-Merge. Sekine [4, 5, 8] (highly repetitive text) [2] Re-Pair. Blocked-Repair-VF [7]

Bundle Adjustment for 3-D Reconstruction: Implementation and Evaluation

Signal processing for handling singing voice texture

: TANDEM-STRAIGHT. Make singing voice tangible: TANDEM-STRAIGHT and temporally variable morphing as substrate. Hideki Kawahara 1 and Masanori Morise 2

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

Ψηφιακή Επεξεργασία Φωνής

F0 Estimation of Melody and Bass Lines in Real-world Musical Audio Signals

Sampling Basics (1B) Young Won Lim 9/21/13

BiCG CGS BiCGStab BiCG CGS 5),6) BiCGStab M Minimum esidual part CGS BiCGStab BiCGStab 2 PBiCG PCGS α β 3 BiCGStab PBiCGStab PBiCG 4 PBiCGStab 5 2. Bi

Feasible Regions Defined by Stability Constraints Based on the Argument Principle

Speech Recognition using Phase Information based on Long-Term Analysis

1,a) 1,b) 2 3 Sakriani Sakti 1 Graham Neubig 1 1. A Study on HMM-Based Speech Synthesis Using Rich Context Models

ΣΧΟΛΗ Σχολή Τεχνολογικών Εφαρμογών ΤΜΗΜΑ Ηλεκτρονικών Μηχανικών Τ.Ε. ΕΠΙΠΕΔΟ ΣΠΟΥΔΩΝ Προπτυχιακό ΚΩΔΙΚΟΣ ΜΑΘΗΜΑΤΟΣ ΕΞΑΜΗΝΟ ΣΠΟΥΔΩΝ 5

FX10 SIMD SIMD. [3] Dekker [4] IEEE754. a.lo. (SpMV Sparse matrix and vector product) IEEE754 IEEE754 [5] Double-Double Knuth FMA FMA FX10 FMA SIMD

Buried Markov Model Pairwise

([28] Bao-Feng Feng (UTP-TX), ( ), [20], [16], [24]. 1 ([3], [17]) p t = 1 2 κ2 T + κ s N -259-

2 ICA. (ICA, Independent Component Analysis) (PCA, Principal Compoenent Analysis) x(t) =(x 1 (t),...,x m (t)) T t =0, 1, 2,... PCA 2 ICA.

Filter Diagonalization Method which Constructs an Approximation of Orthonormal Basis of the Invariant Subspace from the Filtered Vectors


ΔΙΑΚΡΙΤΟΣ ΜΕΤΑΣΧΗΜΑΤΙΣΜΟΣ FOURIER - Discrete Fourier Transform - DFT -

HMY 429: Εισαγωγή στην Επεξεργασία Ψηφιακών. Σήματα διακριτού χρόνου

Fundamentals of Signal Processing for Communications Systems

Ηλεκτρονικοί Υπολογιστές IV

Optimization, PSO) DE [1, 2, 3, 4] PSO [5, 6, 7, 8, 9, 10, 11] (P)

1181 (real-timespeechdriven) 1 1 ( ) D FAP FAP (voiceactivationdetectionvad) D FaceGen 3- D XfaceEd MPEG-4 1 FAP 66 FAP ( ) FAP 84

A Sequential Experimental Design based on Bayesian Statistics for Online Automatic Tuning. Reiji SUDA,

ΒΙΟΓΡΑΦΙΚΟ ΣΗΜΕΙΩΜΑ Δρ. ΣΩΤΗΡΙΟΣ Α. ΔΑΛΙΑΝΗΣ

BCI On Feature Extraction from Multi-Channel Brain Waves Used for Brain Computer Interface

Quick algorithm f or computing core attribute

(hidden Markov model: HMM) FUNDAMENTALS OF SPEECH SYNTHESIS BASED ON HMM. Keiichi Tokuda. Department of Computer Science

Ψηφιακή Επεξεργασία Φωνής

ITU-R BT.2033 (2013/01) / 0) ( )

GPU DD Double-Double 3 4 BLAS Basic Linear Algebra Subprograms [3] 2

Εξάλειψη αντήχησης από ηχητικά σήματα με υποκειμενικά / ψυχοακουστικά κριτήρια

ΚΑΝΟΝΙΣΜΟΣ ΕΚΠΟΝΗΣΗΣ ΕΡΓΑΣΙΩΝ ΓΙΑ ΤΟ ΜΑΘΗΜΑ «ΕΠΕΞΕΡΓΑΣΙΑ ΨΗΦΙΑΚΟΥ ΣΗΜΑΤΟΣ ΚΑΙ ΣΧΕΔΙΑΣΜΟΣ ΥΛΙΚΟΥ»

[1] DNA ATM [2] c 2013 Information Processing Society of Japan. Gait motion descriptors. Osaka University 2. Drexel University a)

! : ;, - "9 <5 =*<

Sinsy: HMM. Sinsy An HMM-based singing voice synthesis system which can realize your wish I want this person to sing my song

Yoshifumi Moriyama 1,a) Ichiro Iimura 2,b) Tomotsugu Ohno 1,c) Shigeru Nakayama 3,d)

ECE 468: Digital Image Processing. Lecture 8

Π Ο Λ Ι Τ Ι Κ Α Κ Α Ι Σ Τ Ρ Α Τ Ι Ω Τ Ι Κ Α Γ Ε Γ Ο Ν Ο Τ Α

DERIVATION OF MILES EQUATION FOR AN APPLIED FORCE Revision C

ITU-R SA (2010/01)! " # $% & '( ) * +,

HOSVD. Higher Order Data Classification Method with Autocorrelation Matrix Correcting on HOSVD. Junichi MORIGAKI and Kaoru KATAYAMA

Περιεχόµενα. ΕΠΛ 422: Συστήµατα Πολυµέσων. Μέθοδοι συµπίεσης ηχητικών. Βιβλιογραφία. Κωδικοποίηση µε βάση την αντίληψη.

University of Illinois at Urbana-Champaign ECE 310: Digital Signal Processing

EM Baum-Welch. Step by Step the Baum-Welch Algorithm and its Application 2. HMM Baum-Welch. Baum-Welch. Baum-Welch Baum-Welch.

[5] F 16.1% MFCC NMF D-CASE 17 [5] NMF NMF 3. [5] 1 NMF Deep Neural Network(DNN) FUSION 3.1 NMF NMF [12] S W H 1 Fig. 1 Our aoustic event detect

Speeding up the Detection of Scale-Space Extrema in SIFT Based on the Complex First Order System

GPGPU. Grover. On Large Scale Simulation of Grover s Algorithm by Using GPGPU

ΣΤΗΑ ΨΕΣ /5/2013 2:27 µµ. Θυµηθείτε τον ορισµό του Περιοδικού Σήµατος ιακριτού Χρόνου: την ακολουθία σηµάτων: jk n N ( ) sagri@di.uoa.

Analysis of prosodic features in native and non-native Japanese using generation process model of fundamental frequency contours

6.003: Signals and Systems. Modulation

Οι απόψεις και τα συμπεράσματα που περιέχονται σε αυτό το έγγραφο, εκφράζουν τον συγγραφέα και δεν πρέπει να ερμηνευτεί ότι αντιπροσωπεύουν τις

Retrieval of Seismic Data Recorded on Open-reel-type Magnetic Tapes (MT) by Using Existing Devices

Detection and Recognition of Traffic Signal Using Machine Learning

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

Anomaly Detection with Neighborhood Preservation Principle

Higher-Order Correlation Analysis of Pitch Fluctuations in Sustained Normal Vowels by the Method of Surrogate Data

Wavelet based matrix compression for boundary integral equations on complex geometries

Mining Syntactic Structures from Text Database

ΒΙΟΓΡΑΦΙΚΟ ΣΗΜΕΙΩΜΑ. Θωμά Σ. Ζαρούχα

8 th Lecture. M.Sc. Bioinformatics and Neuroinformatics Brain signal recording and analysis

An Efficient Calculation of Set Expansion using Zero-Suppressed Binary Decision Diagrams

ΗΜΥ Διακριτός Μετασχηματισμός Fourier

ITU-R P (2012/02) khz 150

n 1 n 3 choice node (shelf) choice node (rough group) choice node (representative candidate)

HMY 799 1: Αναγνώριση Συστημάτων

Ψηφιακή Επεξεργασία Φωνής

X g 1990 g PSRB

Assignment 1 Solutions Complex Sinusoids

Σύστημα ψηφιακής επεξεργασίας ακουστικών σημάτων με χρήση προγραμματιζόμενων διατάξεων πυλών. Πτυχιακή Εργασία. Φοιτητής: ΤΣΟΥΛΑΣ ΧΡΗΣΤΟΣ

Evolution of Novel Studies on Thermofluid Dynamics with Combustion

Company. Patras, Greece

«ΑΝΑΠΣΤΞΖ ΓΠ ΚΑΗ ΥΩΡΗΚΖ ΑΝΑΛΤΖ ΜΔΣΔΩΡΟΛΟΓΗΚΩΝ ΓΔΓΟΜΔΝΩΝ ΣΟΝ ΔΛΛΑΓΗΚΟ ΥΩΡΟ»

ITU-R BT ITU-R BT ( ) ITU-T J.61 (

2

SVM. Research on ERPs feature extraction and classification

[4] 1.2 [5] Bayesian Approach min-max min-max [6] UCB(Upper Confidence Bound ) UCT [7] [1] ( ) Amazons[8] Lines of Action(LOA)[4] Winands [4] 1

Orthogonalization Library with a Numerical Computation Policy Interface

Voice Conversion based on Non-negative Matrix Factorization with Segment Features in Noisy Environments

Second Order RLC Filters

Transcript:

1,a) 1,2,b) Continuous wavelet transform, CWT CWT CWT CWT CWT 100 1. Continuous wavelet transform, CWT [1] CWT CWT CWT [2 5] CWT CWT CWT CWT CWT Irino [6] CWT CWT CWT CWT CWT 1, 7-3-1, 113-0033 2 NTT, 3-1, 243-0198 a) nakamura@hil.t.u-tokyo.ac.jp b) kameoka@hil.t.u-tokyo.ac.jp CWT Fourier short-time Fourier transform, STFT 5 STFT [7] STFT STFT STFT STFT STFT STFT STFT Griffin [8] CWT Griffin STFT [8] Irino CWT [9] CWT CWT CWT CWT c 1959 Information Processing Society of Japan 1

CWT CWT 1 CWT CWT CWT CWT [10] CWT CWT 2. 2.1 CWT l [0, L 1] t [0, T 1] L s l := [s l,0, s l,1,, s l,t 1 ] C T s := [s 0, s 1,, s L 1 ] CLT s f = [ f 0, f 1,, f T 1 ] F, (F := {f C T ; t f t = 0}) CWT CWT W C LT T W := W 0 W 1.. W L 1 s = Wf (1) ψ l,0 ψ l,1 ψ l,t 1 ψ l,t 1 ψ l,0 ψ l,t 2, W l :=...... (2).. ψ l,1 ψ l,2 ψ l,0 W l C T T CWT l CWT ψ l,t := ψ(t /a l )/a l a l t ψ( ) C CWT W W + f = W + s, W + := (W H W) 1 W H (3) H W + W + s = argmin s W f 2 2 (4) f F 2 l 2 (1) CWT T LT CWT 1 CWT W LT C LT CWT CWT CWT CWT CWT 0 LT = s WW + s (5) 0 LT LT W (5) s W STFT STFT [7]. 2.2 CWT CWT CWT 1 CWT CWT CWT CWT 2.3 CWT (5) Fourier discrete Fourier transform, DFT Ŵ = Ŵ 0 Ŵ 1.. Ŵ L 1 0 = ŝ ŴŴ + ŝ, (6), Ŵ l = F T W l F 1 T, Ŵ+ = (Ŵ H Ŵ) 1 Ŵ H (7) F T C T T DFT ˆ DFT W l F T FT 1 Ŵ l l k [0, T 1] c 1959 Information Processing Society of Japan 2

0 = ŝ l,k 1 C k l ˆψ l,k ˆψ l,kŝl,k (8) C k ( l l, ˆψ l,k ˆψ l,k 0) CWT (5) CWT (5) Morlet [3] auditory wavelet transform [6] CWT STFT STFT [7] CWT CWT 3. 3.1 CWT CWT a [0, ) LT CWT CWT ϕ [ π, π) LT CWT I(ϕ) 0 ϕ I(ϕ) := s(a, ϕ) WW + s(a, ϕ) 2 2, (9) s(a, ϕ) := a e jϕ 0 e jϕ 1. e jϕ LT 1 (10) s(a, ϕ) CWT I(ϕ) s(a, ϕ) I(ϕ) 0 s(a, ϕ) 3.2 I(ϕ) ϕ (9) 2 2 CWT CWT CWT (14) (15) ϕ [11] I(ϕ) I(ϕ) s I + (ϕ, s) I(ϕ) I + (ϕ, s) ϕ s I(ϕ) I(ϕ) I(ϕ) (4) s W I + (ϕ, s) I(ϕ) = min s(a, ϕ) W f 2 2 (11) f F = min s W s(a, ϕ) s 2 2 (12) s(a, ϕ) s 2 2 =: I+ (ϕ, s) (13) (13) s = WW + s(a, ϕ) I + (ϕ, s)/ ϕ = 0 LT s WW + s(a, ϕ), (14) ϕ s (15) [ π, π) LT 3.3 2 (14) (15) s(a, ϕ) CWT CWT s(a, ϕ) s Irino [6] W STFT STFT [7] STFT CWT Lopes [9] 4. 4.1 3 c 1959 Information Processing Society of Japan 3

3 Fourier [3] (i) (ii) FFTŴ lˆf k [B, B+D 1] T D [2πB/D, 2πB/D + 2π] (ii) [0, 2π] DFT n := (B + D)/D n (ii) 4 [2πB/D, 2πn] FFT D CWT l B, D CWT 0 CWT CWT CWT CWT FFT (ii) 0 CWT 4 [10] CWT STFT CWT [10] 2.3 CWT 3 CWT Morlet [3] l { ˆψ l,k } k k [B, B + D 1] (0 B, 0 < D T) CWT k [B, B + D 1] (0 B, 0 < D T) { ˆψ l,k } k CWT l D 2 CWT CWT CWT CWT CWT f Fourier fast Fourier transform, FFT 2 (i) l 4.2 (i) (ii) K := 0 B 0 (D B 0 ) I B0 I D B0 0 (D B0 ) B 0 } {{ } (ii) [ ] 0D B I D 0 D (T D B) } {{ } (i) (16) B 0 := B (n 1)D I D D D 0 D B D B CWT CWT l š l C D K š l = F 1 D KŴ l F T f (17) CWT CWT 3 CWT CWT CWT CWT 4.3 T CWT CWT FFT FFT O(T log 2 T) CWT O(T log 2 T +LT log 2 T) CWT l D l T O(T log 2 T + L 1 l=0 D l log 2 D l ) Irino [6] CWT c 1959 Information Processing Society of Japan 4

Irino LT l D l 5. 5.1 1: 5.1.1 Irino [6] CWT ATR A [12] faf 115 mht 113 CWT FFT 2 2 1000 [3] Fourier ) (log ω)2 exp ( (ω > 0) ˆψ(ω) := 4σ 2 0 (ω 0) (18) ω σ σ 0.02 20 cent 27.5 7040 Hz ±3σ Intel Xeon CPU E31245 (3.3 GHz) 32 GB RAM perceptual evaluation of speech quality (PESQ) [13] PESQ 0.5 4.5 5.1.2 PESQ Irino 4.20 ± 0.08 4.1 ± 0.1 *1. 5 Irino 100 Irino 15 s 10 s/iteration 0.1 s/iteration 5.2 2: 5.2.1 RWC [14] 102 16 khz *1 http://hil.t.u-tokyo.ac.jp/ nakamura/ demo/fastcwt.html 5 6 7 [s/iteration] Objective difference grade 1000 100 10 1 0.1 0.01 0.001 0 2 4 6 8 10 12 14 16 18 [s] -0.5-1 -1.5-2 -2.5-3 -3.5 [Irino1993] -4 0 100 200 300 400 500 ([ Pσ, Pσ] (P = 1, 2, 3, 5)) Irino [6] Perceptual evaluation of audio quality Objective difference grade objective difference grades -0.5-1 -1.5-2 -2.5-3 -3.5 [Irino1993] -4 0 10 20 30 40 50 [s] ([ Pσ, Pσ] (P = 1, 2, 3, 5)) Irino [6] Perceptual evaluation of audio quality objective difference grades 30 s 35 s CWT σ = 0.02 (i) ±Pσ (P = 1, 2, 3, 5) 500 Irino c 1959 Information Processing Society of Japan 5

8 Perceptual evaluation of speech quality 4.5 4 3.5 3 2.5 2 1.5 0 20 40 60 80 100 [s] ([ Pσ, Pσ] (P = 1, 2, 3, 5)) Irino [6] Perceptual evaluation of speech quality 100 Intel Core i3-2120 CPU (3.30 GHz) 8GB RAM 5.1.1 perceptual evaluation of audio quality (PEAQ) [15] objective differential grade (ODG) 4 0 ODG 5.2.2 6 ODG P = 3, 5 ODG 100 2.0 *2 Irino P 3 Irino P 7 RWC-MDB-G-2001 No. 1 ODG ODG ATR [12] A fafsc110 7 s 8 P = 3 6. Irino [6] CWT [10] *2 c.f.) MPEG-3 160 kbps ODG 3.68 ± 0.03 Irino 100 JSPS 26730100 [1] [ ] Vol. 39, No. 6, pp. 413 418 (2009). [2] Schmidt, M. N. and Mørup, M.: Nonnegative matrix factor 2- D deconvolution for blind single channel source separation, Independent Component Analysis and Blind Signal Separation, Springer, pp. 700 707 (2006). [3] Kameoka, H.: Statistical Approach to Multipitch Analysis, PhD Thesis, The University of Tokyo (2007). [4] Muller, M., Ellis, D. P. W., Klapuri, A. and Richard, G.: Signal processing for music analysis, IEEE J. Sel. Topics. Signal Process., Vol. 5, No. 6, pp. 1088 1110 (2011). [5] de León, J. P., Beltrán, F. and Beltrán, J. R.: A complex wavelet based fundamental frequency estimator in singlechannel polyphonic signals, Proc. Digital Audio Effects (2013). [6] Irino, T. and Kawahara, H.: Signal reconstruction from modified auditory wavelet transform, IEEE Trans. Signal Process., Vol. 41, No. 12, pp. 3549 3554 (1993). [7] Le Roux, J., Kameoka, H., Ono, N. and Sagayama, S.: Fast Signal Reconstruction from Magnitude STFT Spectrogram Based on Spectrogram Consistency, Proc. Int. Conf. Digital Audio Effects, pp. 397 403 (2010). [8] Griffin, D. and Lim, J.: Signal estimation from modified short-time Fourier transform, IEEE Trans. Acoust., Speech, Signal Process., Vol. 32, No. 2, pp. 236 243 (1984). [9] Lopes, D. M. and White, P. R.: Signal reconstruction from the magnitude or phase of a generalised wavelet transform, Proc. Eur. Signal Process. Conf., pp. 2029 2032 (2000). [10] (2008). 2008-281898. [11] Ortega, J. M. and Rheinboldt, W. C.: Iterative solution of nonlinear equations in several variables, No. 30 (2000). [12] Kurematsu, A., Takeda, K., Sagisaka, Y., Katagiri, S., Kuwabara, H. and Shikano, K.: ATR Japanese Speech Database as a Tool of Speech Recognition and Synthesis, Speech Commun., Vol. 9, No. 4, pp. 357 363 (1990). [13] ITU-T: Recommendation P.862, Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-To- End Speech Quality Assessment of Narrow-Band Telephone Networks and Speech Codecs (2001). [14] Goto, M.: Development of the RWC Music Database, Proc. Int. Congress Acoust., pp. l 553 556 (2004). [15] ITU-T: Recommendation BS.1387-1, Perceptual Evaluation of Audio Quality (PEAQ): Method for Objective measurements of perceived audio quality (2001). c 1959 Information Processing Society of Japan 6