F0 Estimation of Melody and Bass Lines in Real-world Musical Audio Signals



Σχετικά έγγραφα
MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)

SNR F0 [2], [3], [4] F0 F0 F0 F0 F0 TUSK F0 TUSK F0 6 TUSK 6 F0 2. F0 F0 [5] [6] [7] p[8] Cepstrum [9], [10] [11] [12] [13] F0 [14] F0 [15] DIO[16] [1

Vol.4-DCC-8 No.8 Vol.4-MUS-5 No.8 4// 3 3 Hanning (T ) 3 Hanning 3T (y(t)w(t)) dt =.5 T y (t)dt. () STRAIGHT F 3 TANDEM-STRAIGHT[] 3 F F 3 [] F []. :

Fourier transform, STFT 5. Continuous wavelet transform, CWT STFT STFT STFT STFT [1] CWT CWT CWT STFT [2 5] CWT STFT STFT CWT CWT. Griffin [8] CWT CWT

VOCODER VOCODER Vocal

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

: TANDEM-STRAIGHT. Make singing voice tangible: TANDEM-STRAIGHT and temporally variable morphing as substrate. Hideki Kawahara 1 and Masanori Morise 2

Parameter Estimation of Mixture Model of Multiple Instruments and Application to Musical Instrument Identification

Signal processing for handling singing voice texture

Non-negative Matrix Factorization, NMF [5] NMF. [1 3] Bregman [4] Harmonic-Temporal Clustering, HTC [2,3] 1,2,b) NTT

Bundle Adjustment for 3-D Reconstruction: Implementation and Evaluation

6.003: Signals and Systems. Modulation

GUI

Ψηφιακή Επεξεργασία Φωνής

CSJ. Speaker clustering based on non-negative matrix factorization using i-vector-based speaker similarity

Main source: "Discrete-time systems and computer control" by Α. ΣΚΟΔΡΑΣ ΨΗΦΙΑΚΟΣ ΕΛΕΓΧΟΣ ΔΙΑΛΕΞΗ 4 ΔΙΑΦΑΝΕΙΑ 1

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

Retrieval of Seismic Data Recorded on Open-reel-type Magnetic Tapes (MT) by Using Existing Devices

1,a) 1,b) 2 3 Sakriani Sakti 1 Graham Neubig 1 1. A Study on HMM-Based Speech Synthesis Using Rich Context Models

Acoustic Signal Adjustment by Considering Musical Expressive Intention Using a Performance Intension Function

Quick algorithm f or computing core attribute

EM Baum-Welch. Step by Step the Baum-Welch Algorithm and its Application 2. HMM Baum-Welch. Baum-Welch. Baum-Welch Baum-Welch.

A Method of Trajectory Tracking Control for Nonminimum Phase Continuous Time Systems

ΣΧΟΛΗ Σχολή Τεχνολογικών Εφαρμογών ΤΜΗΜΑ Ηλεκτρονικών Μηχανικών Τ.Ε. ΕΠΙΠΕΔΟ ΣΠΟΥΔΩΝ Προπτυχιακό ΚΩΔΙΚΟΣ ΜΑΘΗΜΑΤΟΣ ΕΞΑΜΗΝΟ ΣΠΟΥΔΩΝ 5

Feasible Regions Defined by Stability Constraints Based on the Argument Principle

Detection and Recognition of Traffic Signal Using Machine Learning

Applying Markov Decision Processes to Role-playing Game

Query by Phrase (QBP) (Music Information Retrieval, MIR) QBH QBP / [1, 2] [3, 4] Query-by-Humming (QBH) QBP MIDI [5, 6] [8 10] [7]

Second Order RLC Filters

Study of In-vehicle Sound Field Creation by Simultaneous Equation Method

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

( ) (Harmonic-Temporal Clustering; HTC) [1], [2] ( ) ( ) [4] HTC. (Non-negative Matrix Factorization; NMF) [3] [5], [6] [7], [8]

Study on Re-adhesion control by monitoring excessive angular momentum in electric railway traction

[5] F 16.1% MFCC NMF D-CASE 17 [5] NMF NMF 3. [5] 1 NMF Deep Neural Network(DNN) FUSION 3.1 NMF NMF [12] S W H 1 Fig. 1 Our aoustic event detect

A Method for Creating Shortcut Links by Considering Popularity of Contents in Structured P2P Networks

Nov Journal of Zhengzhou University Engineering Science Vol. 36 No FCM. A doi /j. issn

Study of urban housing development projects: The general planning of Alexandria City

ΕΥΡΕΣΗ ΤΟΥ ΔΙΑΝΥΣΜΑΤΟΣ ΘΕΣΗΣ ΚΙΝΟΥΜΕΝΟΥ ΡΟΜΠΟΤ ΜΕ ΜΟΝΟΦΘΑΛΜΟ ΣΥΣΤΗΜΑ ΟΡΑΣΗΣ

n 1 n 3 choice node (shelf) choice node (rough group) choice node (representative candidate)

Schedulability Analysis Algorithm for Timing Constraint Workflow Models

Διπλωματική Εργασία. του φοιτητή του Τμήματος Ηλεκτρολόγων Μηχανικών και Τεχνολογίας Υπολογιστών της Πολυτεχνικής Σχολής του Πανεπιστημίου Πατρών

Vol. 31,No JOURNAL OF CHINA UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb

Sampling Basics (1B) Young Won Lim 9/21/13

Buried Markov Model Pairwise

Anomaly Detection with Neighborhood Preservation Principle

2 ICA. (ICA, Independent Component Analysis) (PCA, Principal Compoenent Analysis) x(t) =(x 1 (t),...,x m (t)) T t =0, 1, 2,... PCA 2 ICA.

ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ. του φοιτητή του Τμήματος Ηλεκτρολόγων Μηχανικών και. Τεχνολογίας Υπολογιστών της Πολυτεχνικής Σχολής του. Πανεπιστημίου Πατρών

Development of a basic motion analysis system using a sensor KINECT

Supplementary Materials for Evolutionary Multiobjective Optimization Based Multimodal Optimization: Fitness Landscape Approximation and Peak Detection

VSC STEADY2STATE MOD EL AND ITS NONL INEAR CONTROL OF VSC2HVDC SYSTEM VSC (1. , ; 2. , )


Indexing Methods for Encrypted Vector Databases

Development of a Seismic Data Analysis System for a Short-term Training for Researchers from Developing Countries

Yoshifumi Moriyama 1,a) Ichiro Iimura 2,b) Tomotsugu Ohno 1,c) Shigeru Nakayama 3,d)

Design and Fabrication of Water Heater with Electromagnetic Induction Heating

Elements of Information Theory

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

476,,. : 4. 7, MML. 4 6,.,. : ; Wishart ; MML Wishart ; CEM 2 ; ;,. 2. EM 2.1 Y = Y 1,, Y d T d, y = y 1,, y d T Y. k : p(y θ) = k α m p(y θ m ), (2.1

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

ΜΕΛΕΤΗ ΚΑΙ ΠΡΟΣΟΜΟΙΩΣΗ ΙΑΜΟΡΦΩΣΕΩΝ ΣΕ ΨΗΦΙΑΚΑ ΣΥΣΤΗΜΑΤΑ ΕΠΙΚΟΙΝΩΝΙΩΝ.

Queensland University of Technology Transport Data Analysis and Modeling Methodologies

Additional Results for the Pareto/NBD Model

GPU. CUDA GPU GeForce GTX 580 GPU 2.67GHz Intel Core 2 Duo CPU E7300 CUDA. Parallelizing the Number Partitioning Problem for GPUs

Research on Economics and Management

y = f(x)+ffl x 2.2 x 2X f(x) x x p T (x) = 1 Z T exp( f(x)=t ) (2) x 1 exp Z T Z T = X x2x exp( f(x)=t ) (3) Z T T > 0 T 0 x p T (x) x f(x) (MAP = Max

Matrices and Determinants

Statistical Inference I Locally most powerful tests

BCI On Feature Extraction from Multi-Channel Brain Waves Used for Brain Computer Interface

Scrub Nurse Robot: SNR. C++ SNR Uppaal TA SNR SNR. Vain SNR. Uppaal TA. TA state Uppaal TA location. Uppaal

Αρχιτεκτονική Σχεδίαση Ασαφούς Ελεγκτή σε VHDL και Υλοποίηση σε FPGA ΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ

ΠΟΛΥΤΕΧΝΕΙΟ ΚΡΗΤΗΣ ΣΧΟΛΗ ΜΗΧΑΝΙΚΩΝ ΠΕΡΙΒΑΛΛΟΝΤΟΣ

Identifying Scenes with the Same Person in Video Content on the Basis of Scene Continuity and Face Similarity Measurement

LUNGOO R. Control Engineering for Development of a Mechanical Ventilator for ICU Use Spontaneous Breathing Lung Simulator LUNGOO

The ε-pseudospectrum of a Matrix

IF(Ingerchange Format) [7] IF C-STAR(Consortium for speech translation advanced research ) [8] IF 2 IF

ES440/ES911: CFD. Chapter 5. Solution of Linear Equation Systems

Chapter 1 Introduction to Observational Studies Part 2 Cross-Sectional Selection Bias Adjustment

Approximation of distance between locations on earth given by latitude and longitude

HW 3 Solutions 1. a) I use the auto.arima R function to search over models using AIC and decide on an ARMA(3,1)

ΠΑΝΕΠΙΣΤΗΜΙΟ ΠΑΤΡΩΝ ΤΜΗΜΑ ΗΛΕΚΤΡΟΛΟΓΩΝ ΜΗΧΑΝΙΚΩΝ ΚΑΙ ΤΕΧΝΟΛΟΓΙΑΣ ΥΠΟΛΟΓΙΣΤΩΝ ΤΟΜΕΑΣ ΣΥΣΤΗΜΑΤΩΝ ΗΛΕΚΤΡΙΚΗΣ ΕΝΕΡΓΕΙΑΣ

Outline. Detection Theory. Background. Background (Cont.)

Content. Introduction... 1

Congruence Classes of Invertible Matrices of Order 3 over F 2

Speeding up the Detection of Scale-Space Extrema in SIFT Based on the Complex First Order System

Assignment 1 Solutions Complex Sinusoids

ΕΘΝΙΚΟ ΚΑΙ ΚΑΠΟΔΙΣΤΡΙΑΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΑΘΗΝΩΝ ΣΧΟΛΗ ΘΕΤΙΚΩΝ ΕΠΙΣΤΗΜΩΝ ΤΜΗΜΑ ΠΛΗΡΟΦΟΡΙΚΗΣ ΚΑΙ ΤΗΛΕΠΙΚΟΙΝΩΝΙΩΝ

The Spiral of Theodorus, Numerical Analysis, and Special Functions

Measurement and Analysis of the Vibrations of Buldings and Technical Constructions

* ** *** *** Jun S HIMADA*, Kyoko O HSUMI**, Kazuhiko O HBA*** and Atsushi M ARUYAMA***

Problem Set 3: Solutions

Prey-Taxis Holling-Tanner

Test Data Management in Practice

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

GPGPU. Grover. On Large Scale Simulation of Grover s Algorithm by Using GPGPU

Correction of chromatic aberration for human eyes with diffractive-refractive hybrid elements

DERIVATION OF MILES EQUATION FOR AN APPLIED FORCE Revision C

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

Ανάλυση, Περιγραφή και Ανάκτηση Μουσικών Δεδομένων: το έργο ΠΟΛΥΜΝΙΑ*

HDD. Point to Point. Fig. 1: Difference of time response by location of zero.

Technical Specifications

Transcript:

99-MUS-31-16, Vol.99, No.68, August 1999. 31 16 goto@etl.go.jp CD EM F0 Estimation o Melody and Bass Lines in Real-world Musical Audio Signals Masataka Goto Electrotechnical Laboratory 1-1-4 Umezono, Tsukuba, Ibaraki 305-8568 Japan Abstract This paper describes a method or estimating the undamental requency (F0) o melody and bass lines in monaural audio signals containing sounds o various instruments. Most previous methods premised mixtures o a ew sounds and had great diiculty dealing with audio signals sampled rom compact discs. Our method does not rely on the unreliable F0 s component and obtains the most predominant F0 supported by harmonics within an intentionally limited requency range. It estimates relative dominance o every possible F0 by using the Expectation- Maximization algorithm and considers F0 s temporal continuity by introducing a multi-agent model. Experimental results show that our real-time system can detect the melody and bass lines in real-world audio signals. 1 1) 5) 4),5) CD (compact disc) ( ) CD 91

6) 10) 11) 18) CD CD 19) ( ) DP () EM (Expectation-Maximization) 20) CD 10 2 D m (t) D b (t) t(f0) F i (t) (i =m, b) A i (t) D m (t) ={F m (t),a m (t)} (1) D b (t) ={F b (t),a b (t)} (2) () ( ) (missing undamental) 1 (1) (2) (3) ( ) 1 () 92

(1) (2) () EM (Expectation-Maximization) 20) EM (3) (singing ormant) 2.8 2 450 Hz 21) 22) 2 Interaction 1: 23) 1.4 2 3 1 2 () ( ) 3.1 24),25) Flanagan 24) 93

16 2 8 4 = (0.45 s) + FFT FFT FFT FFT 2 1 FFT 2: (STFT) x(t) h(t) STFT X(ω, t) = x(τ)h(τ t)e jωτ dτ (3) = a + jb (4) 3.6-7.2 1.8-3.6 0.9-1.8 0.45-0.9 0-0.45 λ(ω, t) 24) λ(ω,t) =ω + a b t b a t a 2 + b 2 (5) h(t) 2 B- 10) 10) STFT h(t) STFT 26) 2 (FIR ) 1/2 (decimator) 0.45 s ( s ) 16 16 bit A/D 1 STFT 512 (FFT) FFT 16 160 (1 ) 10 msec 1 3.2 8) 10) STFT ω λ(ω, t) ψ ψ 10) (t) 27) (t) = { ψ λ(ψ, t) ψ =0, (λ(ψ, t) ψ) < 0} (6) ψ (t) STFT (t) p (ω) { (t) X(ω, t) i ω (t) p (ω) = (7) 0 otherwise 3.3 (BPF) BPF BPF BPF 3 cent ( ( ))Hz Hz cent cent cent = 1200 log Hz 2 REF Hz (8) REF Hz = 440 2 3 12 5 (9) 100 cent 1 1200 cent 0 cent 1200 cent 2400 cent 3600 cent 4800 cent 6000 cent 7200 cent 8400 cent 9600 cent 16.35Hz 32.70Hz 65.41Hz 130.8Hz 261.6Hz 523.3Hz 1047 Hz 2093 Hz 4186 Hz 3: (BPF) 94

x cent BPF BPF i (x) (i = m, b) (t) p (x) BPF BPF i (x) (t) p (x) (t) p (x) cent (t) p (ω) BPF (x) (x) =BPF i(x) (t) p (x) (10) Pow (t) Pow (t) BPF Pow (t) = BPF i (x) (t) p (x) dx (11) 3.4 BPF (x) ( ) () F p(x F ) p(x; θ (t) ) Fhi p(x; θ (t) )= w (t) (F ) p(x F ) df (12) Fl i θ (t) = {w (t) (F ) Fl i F Fh i } (13) Fh i Fl i w (t) (F ) p(x F ) Fhi w (t) (F ) df = 1 (14) Fl i CD (x) p(x; θ (t) ) θ (t) (x) w (t) (F ) F 0 (F ) F 0 (F )=w(t) (F ) (Fl i F Fh i ) (15) p(x F ) (w (t) (F ) ) F 0 (F ) F (x) p(x; θ (t) ) θ (t) θ (t) (x) log p(x; θ(t) ) dx (16) EM (Expectation-Maximization) 20) θ (t) EM E (expectation step) M (maximization step) ( (x)) θ (t) θ (t) ( ) θ (t) θ (t) t 1 x ( ) F EM 1. (E ) Q(θ (t) θ (t) ) Q(θ (t) θ (t) ) = (x) E F [log p(x, F ; θ (t) ) x; θ (t) ] dx (17) E F [a b] b F a 2. (M ) Q(θ (t) θ (t) ) θ (t) θ (t) θ (t) = argmax Q(θ (t) θ (t) ) (18) θ (t) E (17) Q(θ (t) θ (t) ) Fhi = (x)p(f x; θ (t) ) log p(x, F ; θ (t) )df dx Fl i (19) log p(x, F ; θ (t) ) = log(w (t) (F ) p(x F )) = log w (t) (F ) + log p(x F ) (20) M (18) (14) Lagrange λ Euler-Lagrange 95

( w (t) (x) p(f x; θ (t) ) (log w (t) (F )+ ) log p(x F )) dx λ(w (t) 1 (F ) Fh i Fl i ) =0 (21) w (t) (F )= 1 λ (x) p(f x; θ (t) ) dx (22) λ (14) λ =1 p(f x; θ (t) ) p(f x; θ (t) w (t) (F ) p(x F ) )= Fhi w (t) (η) p(x η) dη Fl i (23) w (t) (F ) (θ (t) = {w (t) (F )}) w (t) (F ) w (t) (F )= (x) w (t) (F ) p(x F ) Fhi Fl i w (t) (η) p(x η) dη dx (24) (24) p(x F ) F (i = m) (i = b) N i p(x F )=α c(h) G(x; F + 1200 log 2 h, W i ) (25) h=1 G(x; m, σ) = 1 (x m)2 e 2σ 2 (26) 2πσ 2 α N i () W i 2 G(x; m, σ) c(h) h c(h) =G(h;1, H i ) (H i ) F i (t) F 0 (F )( (15) (24) ) F i (t) = argmax F 0 (F ) (27) F Interaction 4: 3.5 3 28) 16) ( 4) (1) ( ) Pow (t) (2) 3 96

(3) (4) (5) (6) (7) t F i (t) A i (t) F i (t) (t) p (ω) 4 ( 1) () 1: Fh m = 9600 cent (4186 Hz) Fh b = 4800 cent (261.6 Hz) Fl m = 3600 cent (130.8 Hz) Fl b = 1000 cent (29.14 Hz) N m =16 N b =6 W m = 17 cent W b = 17 cent H m = 5.5 H b = 2.7 2: CD My Heart Will GoOn (Celine Dion) Vision o Love (Mariah Carey) Always (Bon Jovi) Time Goes By (Every Little Thing) Spirit o Love (Sing Like Talking) (Misia) Scarborough Fair (Herbie Hancock) jazz Autumn Leaves (Julian Cannonball Adderley) jazz On Green Dolphin Street (Miles Davis) jazz Violin Concerto in D, Op. 35 (Tchaikovsky) classical 5: : ( ) ( ) ( 5) D i (t) 2 AR 29) 3 LAN (Ethernet) RACP (Remote Audio Control Protocol) RACP RMCP (Remote Music Control Protocol) 30) (Pentium II 450 MHzCPU x 2, Linux 2.2) (SGI Octane R10000 250 MHz CPU, Irix 6.4) 2 10 CD 97

5 () EM CD [1], : AP1000,, Vol. 37, No. 7, pp. 1460 1468 (1996). [2], :, D-II, Vol. J81-D-II, No. 2, pp. 227 237 (1998). [3] Goto, M. and Muraoka, Y.: Music Understanding At The Beat Level Real-time Beat Tracking For AudioSignals, Computational Auditory Scene Analysis, Lawrence Erlbaum Associates, Publishers, pp. 157 176 (1998). [4] Goto, M. and Muraoka, Y.: Real-time Beat Tracking or Drumless Audio Signals: Chord Change Detection or Musical Decisions, Speech Communication, Vol. 27, No. 3 4, pp. 311 335 (1999). [5] :,, (1998). [6] Rabiner, L. R., Cheng, M. J., Rosenberg, A. E. and McGonegal, C. A.: A Comparative Perormance Study o Several Pitch Detection Algorithms, IEEE Trans. on ASSP, Vol. ASSP-24, No. 5, pp. 399 418 (1976). [7] Nehorai, A. and Porat, B.: Adaptive Comb Filtering or Harmonic Signal Enhancement, IEEE Trans. on ASSP, Vol. ASSP-34, No. 5, pp. 1124 1138 (1986). [8] Charpentier, F. J.: Pitch detection using the shortterm phase spectrum, Proc. o ICASSP 86, pp. 113 116 (1986). [9],, :, D-II, Vol. J79-D-II, No. 11, pp. 1771 1781 (1996). [10],, Patterson, R. D., de Cheveigné, A.:, H-98-116, pp. 31 38 (1998). [11] Chae, C. and Jae, D.: Source separation and note identiication in polyphonic music, Proc. o ICASSP 86, pp. 1289 1292 (1986). [12] :,, (1991). [13] Brown, G. J. and Cooke, M.: Perceptual Grouping o Musical Sounds: A Computational Model, J. o New Music Research, Vol. 23, pp. 107 132 (1994). [14] :,, (1994). [15], :,, Vol. 38, No. 1, pp. 146 157 (1997). [16],,, :,, Vol. 12, No. 1, pp. 111 120 (1997). [17], :, 98-MUS-24-5, pp. 33 40 (1998). [18] :,, Vol. 54, No. 10, pp. 715 719 (1998). [19], :, 2-9-1 (1998). [20] Dempster, A. P., Laird, N. M. and Rubin, D. B.: Maximum likelihood rom incomplete data via the EM algorithm, J. Roy. Stat. Soc. B, Vol. 39, No. 1, pp. 1 38 (1977). [21] Richards, W.(ed.): Natural Computation, The MIT Press (1988). [22] Ritsma, R. J.: Frequencies Dominant in the Perception o the Pitch o Complex Sounds, J. Acoust. Soc. Am., Vol. 42, No. 1, pp. 191 198 (1967). [23] Plomp, R.: Pitch o Complex Tones, J. Acoust. Soc. Am., Vol. 41, No. 6, pp. 1526 1533 (1967). [24] Flanagan, J. L. and Golden, R. M.: Phase Vocoder, The Bell System Technical J., Vol. 45, pp. 1493 1509 (1966). [25] Boashash, B.: Estimating and Interpreting The Instantaneous Frequency o a Signal, Proc. o the IEEE, Vol. 80, No. 4, pp. 520 568 (1992). [26] Vetterli, M.: A Theory o Multirate Filter Banks, IEEE Trans. on ASSP, Vol. ASSP-35, No. 3, pp. 356 372 (1987). [27] Abe, T., Kobayashi, T. and Imai, S.: The IF spectrogram: a new spectral representation, Proc.oASVA97, pp. 423 430 (1997). [28] Goto, M. and Muraoka, Y.: Beat Tracking based on Multiple-agent Architecture A Real-time Beat Tracking System or Audio Signals, Proc. o the Second Intl. Con. on Multiagent Systems, pp. 103 110 (1996). [29] Aikawa, K., Kawahara, H. and Tsuzaki, M.: A neural matrix model or active tracking o requency-modulated tones, Proc. o ICSLP 96, pp. 578 581 (1996). [30],, : RMCP:,, Vol. 40, No. 3, pp. 1335 1345 (1999). 98