MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)

Σχετικά έγγραφα
Query by Phrase (QBP) (Music Information Retrieval, MIR) QBH QBP / [1, 2] [3, 4] Query-by-Humming (QBH) QBP MIDI [5, 6] [8 10] [7]

Fourier transform, STFT 5. Continuous wavelet transform, CWT STFT STFT STFT STFT [1] CWT CWT CWT STFT [2 5] CWT STFT STFT CWT CWT. Griffin [8] CWT CWT

Non-negative Matrix Factorization, NMF [5] NMF. [1 3] Bregman [4] Harmonic-Temporal Clustering, HTC [2,3] 1,2,b) NTT

Singing Information Processing: Music Information Processing for Singing Voices

Vol.4-DCC-8 No.8 Vol.4-MUS-5 No.8 4// 3 3 Hanning (T ) 3 Hanning 3T (y(t)w(t)) dt =.5 T y (t)dt. () STRAIGHT F 3 TANDEM-STRAIGHT[] 3 F F 3 [] F []. :

SNR F0 [2], [3], [4] F0 F0 F0 F0 F0 TUSK F0 TUSK F0 6 TUSK 6 F0 2. F0 F0 [5] [6] [7] p[8] Cepstrum [9], [10] [11] [12] [13] F0 [14] F0 [15] DIO[16] [1

Parameter Estimation of Mixture Model of Multiple Instruments and Application to Musical Instrument Identification

( ) (Harmonic-Temporal Clustering; HTC) [1], [2] ( ) ( ) [4] HTC. (Non-negative Matrix Factorization; NMF) [3] [5], [6] [7], [8]

Buried Markov Model Pairwise

The Algorithm to Extract Characteristic Chord Progression Extended the Sequential Pattern Mining

Ανάλυση, Περιγραφή και Ανάκτηση Μουσικών Δεδομένων: το έργο ΠΟΛΥΜΝΙΑ*

GUI

(Υπογραϕή) (Υπογραϕή) (Υπογραϕή)

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

CSJ. Speaker clustering based on non-negative matrix factorization using i-vector-based speaker similarity

Gemini, FastMap, Applications. Εαρινό Εξάμηνο Τμήμα Μηχανικών Η/Υ και Πληροϕορικής Πολυτεχνική Σχολή, Πανεπιστήμιο Πατρών

Anomaly Detection with Neighborhood Preservation Principle

Estimation, Evaluation and Guarantee of the Reverberant Speech Recognition Performance based on Room Acoustic Parameters

F0 Estimation of Melody and Bass Lines in Real-world Musical Audio Signals

VOCODER VOCODER Vocal

Re-Pair n. Re-Pair. Re-Pair. Re-Pair. Re-Pair. (Re-Merge) Re-Merge. Sekine [4, 5, 8] (highly repetitive text) [2] Re-Pair. Blocked-Repair-VF [7]


Ψηφιακή Επεξεργασία Φωνής

Quick algorithm f or computing core attribute

Yoshifumi Moriyama 1,a) Ichiro Iimura 2,b) Tomotsugu Ohno 1,c) Shigeru Nakayama 3,d)

Japanese Fuzzy String Matching in Cooking Recipes

ER-Tree (Extended R*-Tree)

Vol. 31,No JOURNAL OF CHINA UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb

Identifying Scenes with the Same Person in Video Content on the Basis of Scene Continuity and Face Similarity Measurement

Study on Re-adhesion control by monitoring excessive angular momentum in electric railway traction

[5] F 16.1% MFCC NMF D-CASE 17 [5] NMF NMF 3. [5] 1 NMF Deep Neural Network(DNN) FUSION 3.1 NMF NMF [12] S W H 1 Fig. 1 Our aoustic event detect

Signal processing for handling singing voice texture

(1) (2) MFA/SFA: material flow analysis/ substance flow analysis MFA/SFA

Area Location and Recognition of Video Text Based on Depth Learning Method

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

[1] DNA ATM [2] c 2013 Information Processing Society of Japan. Gait motion descriptors. Osaka University 2. Drexel University a)

Reading Order Detection for Text Layout Excluded by Image

Acoustic Signal Adjustment by Considering Musical Expressive Intention Using a Performance Intension Function

Retrieval of Seismic Data Recorded on Open-reel-type Magnetic Tapes (MT) by Using Existing Devices

Εξάλειψη αντήχησης από ηχητικά σήματα με υποκειμενικά / ψυχοακουστικά κριτήρια

Stabilization of stock price prediction by cross entropy optimization

Content. Introduction... 1

GPU. CUDA GPU GeForce GTX 580 GPU 2.67GHz Intel Core 2 Duo CPU E7300 CUDA. Parallelizing the Number Partitioning Problem for GPUs

ΔΙΠΛΩΜΑΤΙΚΕΣ ΕΡΓΑΣΙΕΣ

Echo path identification for stereophonic acoustic echo cancellation without pre-processing

{takasu, Conditional Random Field

Ερευνητική+Ομάδα+Τεχνολογιών+ Διαδικτύου+

1,a) 1,b) 2 3 Sakriani Sakti 1 Graham Neubig 1 1. A Study on HMM-Based Speech Synthesis Using Rich Context Models

ΓΙΑΝΝΟΥΛΑ Σ. ΦΛΩΡΟΥ Ι ΑΚΤΟΡΑΣ ΤΟΥ ΤΜΗΜΑΤΟΣ ΕΦΑΡΜΟΣΜΕΝΗΣ ΠΛΗΡΟΦΟΡΙΚΗΣ ΤΟΥ ΠΑΝΕΠΙΣΤΗΜΙΟΥ ΜΑΚΕ ΟΝΙΑΣ ΒΙΟΓΡΑΦΙΚΟ ΣΗΜΕΙΩΜΑ

[4] 1.2 [5] Bayesian Approach min-max min-max [6] UCB(Upper Confidence Bound ) UCT [7] [1] ( ) Amazons[8] Lines of Action(LOA)[4] Winands [4] 1

ITU-R BT.1908 (2012/01) !" # $ %& '( ) * +, - ( )

Detection and Recognition of Traffic Signal Using Machine Learning

Ανάκτηση Εικόνας βάσει Υφής με χρήση Eye Tracker

Ανάλυση σχημάτων βασισμένη σε μεθόδους αναζήτησης ομοιότητας υποακολουθιών (C589)


2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems

Τμήμα Επιστήμης Υπολογιστών ΗΥ-474. Ψηφιακός ήχος. Χαρακτηριστικά σήματος ήχου Ψηφιοποίηση ήχου Συνθετικοί ήχοι MIDI

Speech Recognition using Phase Information based on Long-Term Analysis

Χαρακτηρισµός Νεοπλασµάτων στη Μαστογραφία από το Σχήµα της Παρυφής µε χρήση Νευρωνικών ικτύων

Razor. [1], [2] (typical) LSI V/F. Razor. (Timing Fault: TF) [7] Razor [3], [4], [5] DVFS - Dynamic Voltage and Frequency Scaling [6]

Nov Journal of Zhengzhou University Engineering Science Vol. 36 No FCM. A doi /j. issn

1181 (real-timespeechdriven) 1 1 ( ) D FAP FAP (voiceactivationdetectionvad) D FaceGen 3- D XfaceEd MPEG-4 1 FAP 66 FAP ( ) FAP 84

Δράσεις για την ενίσχυση της Δημιουργικότητας μέσω της Μουσικής Πληροφόρησης και της Τηλεκπαίδευσης στη Φιλαρμονική Ένωση Κέρκυρας «Καποδίστριας»

A Method for Creating Shortcut Links by Considering Popularity of Contents in Structured P2P Networks

ΜΕΤΑΠΤΥΧΙΑΚΟ ΠΡΟΓΡΑΜΜΑ ΣΠΟΥΔΩΝ

HOSVD. Higher Order Data Classification Method with Autocorrelation Matrix Correcting on HOSVD. Junichi MORIGAKI and Kaoru KATAYAMA

Σύστημα ψηφιακής επεξεργασίας ακουστικών σημάτων με χρήση προγραμματιζόμενων διατάξεων πυλών. Πτυχιακή Εργασία. Φοιτητής: ΤΣΟΥΛΑΣ ΧΡΗΣΤΟΣ

Καταλογοποίηση ακουστικών μουσικών δεδομένων

Supplementary Materials for Evolutionary Multiobjective Optimization Based Multimodal Optimization: Fitness Landscape Approximation and Peak Detection

BCI On Feature Extraction from Multi-Channel Brain Waves Used for Brain Computer Interface

k A = [k, k]( )[a 1, a 2 ] = [ka 1,ka 2 ] 4For the division of two intervals of confidence in R +

Dynamic types, Lambda calculus machines Section and Practice Problems Apr 21 22, 2016

Schedulability Analysis Algorithm for Timing Constraint Workflow Models

Indexing Methods for Encrypted Vector Databases

Development of a Seismic Data Analysis System for a Short-term Training for Researchers from Developing Countries

1 n-gram n-gram n-gram [11], [15] n-best [16] n-gram. n-gram. 1,a) Graham Neubig 1,b) Sakriani Sakti 1,c) 1,d) 1,e)

Applying Markov Decision Processes to Role-playing Game

Feasible Regions Defined by Stability Constraints Based on the Argument Principle

A Vocabulary-Free Infinity-Gram Model for Chord Progression Analysis

Discriminative Language Modeling Based on Risk Minimization Training

Maxima SCORM. Algebraic Manipulations and Visualizing Graphs in SCORM contents by Maxima and Mashup Approach. Jia Yunpeng, 1 Takayuki Nagai, 2, 1

Bundle Adjustment for 3-D Reconstruction: Implementation and Evaluation


Kenta OKU and Fumio HATTORI

ΔΙΠΛΩΜΑΤΙΚΕΣ ΕΡΓΑΣΙΕΣ ΠΜΣ «ΠΛΗΡΟΦΟΡΙΚΗ & ΕΠΙΚΟΙΝΩΝΙΕΣ» OSWINDS RESEARCH GROUP

Company. Patras, Greece

Scrub Nurse Robot: SNR. C++ SNR Uppaal TA SNR SNR. Vain SNR. Uppaal TA. TA state Uppaal TA location. Uppaal

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2014-MUS-104 No /8/26 1,a) Music Structure and Composition with Sound Directivity in 3D Space

ΣΤΟΧΟΙ ΤΟΥ ΜΑΘΗΜΑΤΟΣ ΠΕΡΙΕΧΟΜΕΝΟ

ΕΥΡΕΣΗ ΤΟΥ ΔΙΑΝΥΣΜΑΤΟΣ ΘΕΣΗΣ ΚΙΝΟΥΜΕΝΟΥ ΡΟΜΠΟΤ ΜΕ ΜΟΝΟΦΘΑΛΜΟ ΣΥΣΤΗΜΑ ΟΡΑΣΗΣ

Automatic extraction of bibliography with machine learning

2 ~ 8 Hz Hz. Blondet 1 Trombetti 2-4 Symans 5. = - M p. M p. s 2 x p. s 2 x t x t. + C p. sx p. + K p. x p. C p. s 2. x tp x t.

Web 論 文. Performance Evaluation and Renewal of Department s Official Web Site. Akira TAKAHASHI and Kenji KAMIMURA

Maude 6. Maude [1] UIUC J. Meseguer. Maude. Maude SRI SRI. Maude. AC (Associative-Commutative) Maude. Maude Meseguer OBJ LTL SPIN

LAN. On the Dynamic Performance of Indoor Positioning System using Wireless LAN on Smartphone

v.connect 2 v.connect : A Singing Synthesis System Enabling Users to Control Vocal Tones Makoto Ogawa, 1 Syunji Yazaki 1 and Kôki Abe 1 VOCALOID

ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ

Current Status and Future Prospects of Camera-Based Character Recognition and Document Image Analysis

Analysis of prosodic features in native and non-native Japanese using generation process model of fundamental frequency contours

Τιμοκατάλογος λιανικής

Transcript:

1,a) 1,b) 1,c) 1. MIDI [1], [2] U/D/S 3 [3], [4] 1 [5] Song [6] 1 Sony, Minato, Tokyo 108 0075, Japan a) Emiru.Tsunoo@jp.sony.com b) AkiraB.Inoue@jp.sony.com c) Masayuki.Nishiguchi@jp.sony.com MIDI [7] Li HMM Viterbi [8] [9] Hsu [10] Salamon [11] 2 3 4 5 c 2013 Information Processing Society of Japan 1

1 2. 2.1 3 1 [9] 2 2 3 2 3 1 2 3 1 1 2 3 2.2 2.2.1 1 2-(a) 2-(a) 2-(b) 0 1 x y Y (x, y) 1 x X 1 y Y {a 0, a 1,..., a K 1 } l(x, y) = K 1 k=0 a k log Y (x, y) (1) v(x, y) = 1 (µ(x)<u Y (x,y)<l(x,y)) 0 (U Y (x,y) µ(x)) l(x, y) µ(x) U Y (x, y) µ(x) (otherwise) (2) µ(x) x log Y (x, y) U Y (x, y) y p (y) p + (y) P (x, y) = log Y (x, p (y)) P + (x, y) = log Y (x, p + (y)) U Y (x, y) = (p +(y) y)p (x, y) + (y p (y))p + (x, y) p + (y) p (y) (3) p Y (x, p) = max Y (x, j) (4) p H j p+h c 2013 Information Processing Society of Japan 2

2 : (a) (b) (a) (c) U(x, y) (d) v(x, y) (e) S(x, y) H U Y (x, y) 2-(c) v(x, y) 2-(d) v(x, y) 1 2.2.2 2 Barry 2 [12] [7], [11] Barry A i,c (x, y) N v(x, ny) Y L (x, ny) βiy R (x, ny) A i,0 (x, y) = A i,1 (x, y) = n=1 N α (5) N v(x, ny) Y R (x, ny) βiy L (x, ny) n=1 N α (6) N α Y L (x, y) Y R (x, y) 3 The Beatles Hey Jude S(x, y) : (a) S(x, y) (b) T (x) (c) S(x, y) (d) β i 0 β i 1 S(x, y) = max A i,c(x, y) min A i,c(x, y) (7) i,c i,c 2 S(x, y) 2-(e) S(x, y) The Beatles Hey Jude S(x, y) 3-(a) Y R (x, y) 0 S (x, y) = 1 N α N v(x, ny) Y L (x, ny) (8) n=1 c 2013 Information Processing Society of Japan 3

4 2.3 S(x, y) 3 (4) S(x, y) 4 S(x, y) T 1 F 1 F 2 2 1/12 F 2/F 1 2 1/12 5 T 2 3-(a) 3-(c) 2.4 3 Viterbi [8], [10] [8] T (x) S(x, y) 5 c t 1 c t 4 [8] 3-(a) 3-(b) U m = {(x m,1, y m,1 ), (x m,2, y m,2 ),...} x m,1 = x m,2 1 = x m,3 2... m U m m M U m x m,first = x m,1 x m,last x m,last < x n,first, (m < n, m, n M) D M D M = ( m M (x,y) U m S(x, y) γ 1 (x,y) U m log y log T (x) γ 2 log y m,last log y m,first ) (9) m m γ 1 γ 2 (9) 3 5 4 c t 1 c t 4 M c t 1 c t 2 2 D L Y X d L (x, y) D E Y 1 d E (y) d E (0) = 0 (10) d L (0, y) = S(0, y) + γ 1 log y log T (0) (11) if (x.y) U m and x = x m,first then { de (j) + S(x, y) γ 1 log y log T (x) d L (x, y) max j γ 2 log j log y } c 2013 Information Processing Society of Japan 4

(12) if (x, y) U m and x = x m,last then { de (y) d E (y) max d L (x 1, y) + S(x, y) γ1 log y log T (x) 1 ADC2004 1/2 Pop Daisy Opera Jazz MIDI Proposed 0.726 0.942 0.549 0.741 0.556 MIDI 0.692 0.907 0.647 0.862 0.661 (13) if (x, y) U m and x m,first < x < x m,last then d L (x m,n, y m,n ) d L(x m,n 1, y m,n 1 ) + S(x m,n, y m,n ) γ 1 log(y m,n ) log T (x m,n ) D M D Mmax (14) = max d E (j), (15) j D Mmax 3-(c) 3-(d) 3. Zhu [2] [2] 10 4. 2 2048 512 8000Hz (4) H 16 (9) γ 1 γ 2 0.52 5.2 (5) α 0.5 a k 0.15π 2 ADC2004 1 Pop Daisy Opera Jazz MIDI Proposed 0.791 0.971 0.695 0.767 0.564 MIDI 0.728 0.911 0.673 0.874 0.675 4.1 1 MIREX *1 MIREX Raw Pitch Accuracy 1/2 50 MIREX ADC2004 5 20 Pop Daisy Opera Jazz, MIDI 5 MIREX MELODIA[11] 1 Jazz MIDI Pop Daisy Opera 1/2 1 2 Opera *1 Music Information Retrieval Evaluation exchange (MIREX) [Online]. http:www.music-ir.org/mirex/wiki/audio Melody Extraction c 2013 Information Processing Society of Japan 5

3 MIDI 1000 Reference Melody Top 1 Top 5 Top 10 MIDI 100 % 100 % 100 % MELODIA 48.78 % 51.22 % 51.22 % Proposed (Monaural) 64.63 % 75.61 % 78.05 % Proposed (Stereo) 80.49 % 85.37 % 89.02 % 4 14 1000 Reference Melody Top 1 Top 5 Top 10 MIDI 78.99 % 86.23 % 89.86 % MELODIA 47.83 % 50.72 % 53.62 % Proposed (Monaural) 65.85 % 73.17 % 76.83 % Proposed (Stereo) 71.74 % 80.43 % 84.06 % 4.2 MIDI MELODIA MIDI 1000 MIDI 82 15 3 Top 1 Top 5 5 Top 10 10 14 1000 15 138 MELODIA MIDI 4 MIDI 5. 3 [1] S. Pauws, CubyHum: a fully operational query by humming system, in Proc. ISMIR, pp. 187 196, 2002. [2] Y. Zhu and D. Shasha, Warping indexes with envelope transform for query by humming, in ACM SIGMOD Int. Conf., pp. 181 192, 2003. [3] A. Ghias, J. Logan, D. Chamberlin, and B. Smith, Query by humming: musical information retrieval in an audio database, in Proc. ACM Multimedia, pp. 231 236, 1995. [4] L. Lu, H. You, and H.-J. Zhang, A new approach to query by humming in music retrieval, in Proc. ICME, pp.22-25, 2001. [5] T. Nishimura, H. Hashiguchi, J. Takita, J. X. Zhang, M. Goto and R. Oka, Music signal spotting retrieval by a humming query using start frame feature dependent continuous dynamic programming, in Proc. ISMIR, pp. 211-218, 2001. [6] J. Song, S. Y. Bae, and K. Yoon, Mid-level music melody representation of polyphonic audio for query-byhumming system, in Proc. ISMIR, pp. 133 139, 2002. [7] M. Goto, A real-time music-scene-description system: predominant-f0 estimation for detecting melody and bass line in real-world audio signals, in Speech Communication, vol. 43, no. 4, pp. 311 329, 2004. [8] Y. Li, D. L. Wang, Separation of singing voice from music accompaniment for monaural recordings, in Trans. Audio, Speech and Language Processing, vol. 15, no. 4, pp. 1475 1487, 2007. [9] H. Tachibana, T. Ono, N. Ono, and S. Sagayama, Melody line estimation in homophonic music audio signals based on temporal-variablity of melody source, in Proc. ICASSP, pp. 425 428, 2010. [10] C. L. Hsu, D. L. Wang, and J.-S. R. Jang, A trend estimation algorithm for singing pitch detection in musical recordings, in Proc. ICASSP, pp. 393 396, 2011. [11] J. Salamon and E. Goméz, Melody extraction from polyphonic music signals using pitch contour characteristic, in Trans. Audio, Speech and Language Processing, vol. 20, no. 6, pp. 1759 1770, 2012. [12] D. Barry, B. Lawlor, and E. Coyle, Sound source separation: Azimuth discrimination and resynthesis, in Proc. Int. Conf. Digital Audio Effects, 2004. [13] R. B. Dannenberg and N. Hu, Understanding search performance in query-by-humming systems, in Proc. ISMIR, pp.232 237, 2004. c 2013 Information Processing Society of Japan 6