Acoustic Signal Adjustment by Considering Musical Expressive Intention Using a Performance Intension Function

Σχετικά έγγραφα
Fourier transform, STFT 5. Continuous wavelet transform, CWT STFT STFT STFT STFT [1] CWT CWT CWT STFT [2 5] CWT STFT STFT CWT CWT. Griffin [8] CWT CWT

MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)

Ψηφιακή Επεξεργασία Φωνής

VOCODER VOCODER Vocal

GUI

Detection and Recognition of Traffic Signal Using Machine Learning

Study on Re-adhesion control by monitoring excessive angular momentum in electric railway traction

CSJ. Speaker clustering based on non-negative matrix factorization using i-vector-based speaker similarity

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

DERIVATION OF MILES EQUATION FOR AN APPLIED FORCE Revision C

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

Second Order RLC Filters

Estimation, Evaluation and Guarantee of the Reverberant Speech Recognition Performance based on Room Acoustic Parameters

Vol.4-DCC-8 No.8 Vol.4-MUS-5 No.8 4// 3 3 Hanning (T ) 3 Hanning 3T (y(t)w(t)) dt =.5 T y (t)dt. () STRAIGHT F 3 TANDEM-STRAIGHT[] 3 F F 3 [] F []. :

Query by Phrase (QBP) (Music Information Retrieval, MIR) QBH QBP / [1, 2] [3, 4] Query-by-Humming (QBH) QBP MIDI [5, 6] [8 10] [7]

6.003: Signals and Systems. Modulation

Feasible Regions Defined by Stability Constraints Based on the Argument Principle

Πτυχιακή Εργασι α «Εκτι μήσή τής ποιο τήτας εικο νων με τήν χρή σή τεχνήτων νευρωνικων δικτυ ων»

Appendix to On the stability of a compressible axisymmetric rotating flow in a pipe. By Z. Rusak & J. H. Lee

SNR F0 [2], [3], [4] F0 F0 F0 F0 F0 TUSK F0 TUSK F0 6 TUSK 6 F0 2. F0 F0 [5] [6] [7] p[8] Cepstrum [9], [10] [11] [12] [13] F0 [14] F0 [15] DIO[16] [1

Retrieval of Seismic Data Recorded on Open-reel-type Magnetic Tapes (MT) by Using Existing Devices

n 1 n 3 choice node (shelf) choice node (rough group) choice node (representative candidate)

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

Study of In-vehicle Sound Field Creation by Simultaneous Equation Method

( ) (Harmonic-Temporal Clustering; HTC) [1], [2] ( ) ( ) [4] HTC. (Non-negative Matrix Factorization; NMF) [3] [5], [6] [7], [8]

Main source: "Discrete-time systems and computer control" by Α. ΣΚΟΔΡΑΣ ΨΗΦΙΑΚΟΣ ΕΛΕΓΧΟΣ ΔΙΑΛΕΞΗ 4 ΔΙΑΦΑΝΕΙΑ 1

Jesse Maassen and Mark Lundstrom Purdue University November 25, 2013

Buried Markov Model Pairwise

Partial Differential Equations in Biology The boundary element method. March 26, 2013

Sampling Basics (1B) Young Won Lim 9/21/13

Molecular evolutionary dynamics of respiratory syncytial virus group A in

Motion analysis and simulation of a stratospheric airship

Διερεύνηση ακουστικών ιδιοτήτων Νεκρομαντείου Αχέροντα

1 (forward modeling) 2 (data-driven modeling) e- Quest EnergyPlus DeST 1.1. {X t } ARMA. S.Sp. Pappas [4]

C.S. 430 Assignment 6, Sample Solutions

Srednicki Chapter 55

Math 6 SL Probability Distributions Practice Test Mark Scheme

Figure A.2: MPC and MPCP Age Profiles (estimating ρ, ρ = 2, φ = 0.03)..

Estimation of stability region for a class of switched linear systems with multiple equilibrium points

Non-negative Matrix Factorization, NMF [5] NMF. [1 3] Bregman [4] Harmonic-Temporal Clustering, HTC [2,3] 1,2,b) NTT

Resurvey of Possible Seismic Fissures in the Old-Edo River in Tokyo

[1] DNA ATM [2] c 2013 Information Processing Society of Japan. Gait motion descriptors. Osaka University 2. Drexel University a)

Διπλωματική Εργασία. του φοιτητή του Τμήματος Ηλεκτρολόγων Μηχανικών και Τεχνολογίας Υπολογιστών της Πολυτεχνικής Σχολής του Πανεπιστημίου Πατρών

MIA MONTE CARLO ΜΕΛΕΤΗ ΤΩΝ ΕΚΤΙΜΗΤΩΝ RIDGE ΚΑΙ ΕΛΑΧΙΣΤΩΝ ΤΕΤΡΑΓΩΝΩΝ

High order interpolation function for surface contact problem

University of Illinois at Urbana-Champaign ECE 310: Digital Signal Processing

Dynamic types, Lambda calculus machines Section and Practice Problems Apr 21 22, 2016

ΖΩΝΟΠΟΙΗΣΗ ΤΗΣ ΚΑΤΟΛΙΣΘΗΤΙΚΗΣ ΕΠΙΚΙΝΔΥΝΟΤΗΤΑΣ ΣΤΟ ΟΡΟΣ ΠΗΛΙΟ ΜΕ ΤΗ ΣΥΜΒΟΛΗ ΔΕΔΟΜΕΝΩΝ ΣΥΜΒΟΛΟΜΕΤΡΙΑΣ ΜΟΝΙΜΩΝ ΣΚΕΔΑΣΤΩΝ

Design and Fabrication of Water Heater with Electromagnetic Induction Heating

Spectrum Representation (5A) Young Won Lim 11/3/16

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2014-MUS-104 No /8/26 1,a) Music Structure and Composition with Sound Directivity in 3D Space

HW 3 Solutions 1. a) I use the auto.arima R function to search over models using AIC and decide on an ARMA(3,1)

Parameter Estimation of Mixture Model of Multiple Instruments and Application to Musical Instrument Identification

Research on Economics and Management

Monetary Policy Design in the Basic New Keynesian Model

Partial Trace and Partial Transpose

Conjoint. The Problems of Price Attribute by Conjoint Analysis. Akihiko SHIMAZAKI * Nobuyuki OTAKE

ΑΝΙΧΝΕΥΣΗ ΓΕΓΟΝΟΤΩΝ ΒΗΜΑΤΙΣΜΟΥ ΜΕ ΧΡΗΣΗ ΕΠΙΤΑΧΥΝΣΙΟΜΕΤΡΩΝ ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ

Tridiagonal matrices. Gérard MEURANT. October, 2008

Probability and Random Processes (Part II)

[5] F 16.1% MFCC NMF D-CASE 17 [5] NMF NMF 3. [5] 1 NMF Deep Neural Network(DNN) FUSION 3.1 NMF NMF [12] S W H 1 Fig. 1 Our aoustic event detect

[1] P Q. Fig. 3.1

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Assignment 1 Solutions Complex Sinusoids

2 ~ 8 Hz Hz. Blondet 1 Trombetti 2-4 Symans 5. = - M p. M p. s 2 x p. s 2 x t x t. + C p. sx p. + K p. x p. C p. s 2. x tp x t.

EE512: Error Control Coding

Μειέηε, θαηαζθεπή θαη πξνζνκνίσζε ηεο ιεηηνπξγίαο κηθξήο αλεκνγελλήηξηαο αμνληθήο ξνήο ΓΗΠΛΩΜΑΣΗΚΖ ΔΡΓΑΗΑ

[4] 1.2 [5] Bayesian Approach min-max min-max [6] UCB(Upper Confidence Bound ) UCT [7] [1] ( ) Amazons[8] Lines of Action(LOA)[4] Winands [4] 1

Section 9.2 Polar Equations and Graphs

Automatic extraction of bibliography with machine learning


Applying Markov Decision Processes to Role-playing Game

ECE 468: Digital Image Processing. Lecture 8

Analysis of prosodic features in native and non-native Japanese using generation process model of fundamental frequency contours

1,a) 1,b) 2 3 Sakriani Sakti 1 Graham Neubig 1 1. A Study on HMM-Based Speech Synthesis Using Rich Context Models

1 h, , CaCl 2. pelamis) 58.1%, (Headspace solid -phase microextraction and gas chromatography -mass spectrometry,hs -SPME - Vol. 15 No.

Stabilization of stock price prediction by cross entropy optimization

Section 8.3 Trigonometric Equations

ECE Spring Prof. David R. Jackson ECE Dept. Notes 2

ΜΕΛΕΤΗ ΚΑΙ ΠΡΟΣΟΜΟΙΩΣΗ ΙΑΜΟΡΦΩΣΕΩΝ ΣΕ ΨΗΦΙΑΚΑ ΣΥΣΤΗΜΑΤΑ ΕΠΙΚΟΙΝΩΝΙΩΝ.

Paper Reference. Paper Reference(s) 6665/01 Edexcel GCE Core Mathematics C3 Advanced. Thursday 11 June 2009 Morning Time: 1 hour 30 minutes

Ψηφιακή Επεξεργασία Φωνής

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

D Alembert s Solution to the Wave Equation

Approximation of distance between locations on earth given by latitude and longitude

SCITECH Volume 13, Issue 2 RESEARCH ORGANISATION Published online: March 29, 2018

ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ ΣΧΟΛΗ ΠΟΛΙΤΙΚΩΝ ΜΗΧΑΝΙΚΩΝ. «Θεσμικό Πλαίσιο Φωτοβολταïκών Συστημάτων- Βέλτιστη Απόδοση Μέσω Τρόπων Στήριξης»

VBA Microsoft Excel. J. Comput. Chem. Jpn., Vol. 5, No. 1, pp (2006)

ITU-R BT ITU-R BT ( ) ITU-T J.61 (

ΓΕΩΠΟΝΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙO ΑΘΗΝΩΝ ΤΜΗΜΑ ΑΞΙΟΠΟΙΗΣΗΣ ΦΥΣΙΚΩΝ ΠΟΡΩΝ & ΓΕΩΡΓΙΚΗΣ ΜΗΧΑΝΙΚΗΣ

Yoshifumi Moriyama 1,a) Ichiro Iimura 2,b) Tomotsugu Ohno 1,c) Shigeru Nakayama 3,d)

ΠΩΣ ΕΠΗΡΕΑΖΕΙ Η ΜΕΡΑ ΤΗΣ ΕΒΔΟΜΑΔΑΣ ΤΙΣ ΑΠΟΔΟΣΕΙΣ ΤΩΝ ΜΕΤΟΧΩΝ ΠΡΙΝ ΚΑΙ ΜΕΤΑ ΤΗΝ ΟΙΚΟΝΟΜΙΚΗ ΚΡΙΣΗ

ΗΜΥ 220: ΣΗΜΑΤΑ ΚΑΙ ΣΥΣΤΗΜΑΤΑ Ι Ακαδημαϊκό έτος Εαρινό Εξάμηνο Κατ οίκον εργασία αρ. 2

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

«ΑΝΑΠΣΤΞΖ ΓΠ ΚΑΗ ΥΩΡΗΚΖ ΑΝΑΛΤΖ ΜΔΣΔΩΡΟΛΟΓΗΚΩΝ ΓΔΓΟΜΔΝΩΝ ΣΟΝ ΔΛΛΑΓΗΚΟ ΥΩΡΟ»

Διπλωματική Εργασία της φοιτήτριας του Τμήματος Ηλεκτρολόγων Μηχανικών και Τεχνολογίας Υπολογιστών της Πολυτεχνικής Σχολής του Πανεπιστημίου Πατρών

Supporting Information

! " # $ &,-" " (.* & -" " ( /* 0 (1 1* 0 - (* 0 #! - (#* 2 3( 4* 2 (* 2 5!! 3 ( * (7 4* 2 #8 (# * 9 : (* 9

Transcript:

1,a) 2 MOS Acoustic Signal Adjustment by Considering Musical Expressive Intention Using a Performance Intension Function Yuma Koizumi 1,a) Katunobu Itou 2 Abstract: We propose an estimation method for a performance intention function, which is a function of fluctuation of a musician s intent shaped as a smooth curve, from the acoustic signal. In addition, we propose a method to remediate acoustic signals based on a musician s intent by using intention information obtained by performance intention functions. In this paper, discussion of estimation of the function is focused on onset time. The function is modeled as a polynomial regression with respect to time. Onset times are detected by a new onset detection method that considers aural characteristics, and the function is estimated by ridge regression by using onset times and score information. Power spectrogram of observed musical signal is remediated by using the function with respect to each note and musical images, and intentions of the player are clarified. In addition, in terms of closeness of rhythm, MOS of remediated sounds by using performance intention function were higher than the original recorded sound, and significant differences in all instruments were observed. Keywords: Musical expressive intentions, Onset detection, Regression analysis, Sound remediation 1. 1 Graduate School of Computer and Information Sciences, Hosei University 2 Faculty of Computer and Information Sciences, Hosei University a) 12t0005@cis.k.hosei.ac.jp e.g., 1

*1 *2 [1 4] [5] [6] [7] MIDI 2. 2.1 n y n = (y[1], y[2],..., y[n]) T y n = t n + x n + e n (1) *1 *2 t n x n e n x n f(t) t n x n = f (t n ) (2) f(t) 2.2 i y[i + 1] y[i] α α[i] = (t[i + 1] + x[i + 1]) (t[i] + x[i]) y[i + 1] y[i] 3. (3) 3.1 y n t n 3.2 3.1 Onset Detection [8] [9] [10] 3.1.1 2

[k T/2, k + T/2] KLD d k ( ( d k = D k T 2 ),, D ( k + T 2 )) T (6) 1 Fig. 1 y [db] Procedure of generating sub-band mel-frequency normalized spectrum. (The y-axis of all figures are logarithmic scale [db].) τ Kullback-Leibler Divergence(KLD) KLD S(k, ω) 1-I k ω (STFT) 1ms STFT 10ms S(k, ω) J = 7 B j (ω) 1-IIKLD k j 1-III S j (k, ω) = S(k, ω)b j(k, ω) ω S(k, ω)b j(ω) (4) k D(k) KLD D(k) = J ( ) Sj (k τ, ω) S j (k τ, ω) log S j (k, ω) j=1 ω (5) STFT τ = 10(i.e., 10ms) D(k) Y k σ d T = 100(i.e., 100ms) δ(k) [10] KLD Y δ(k) = λ(σ d + Median(d k )) + Median(D) 2 (7) σ d d k λ Y n λ = 0.9 1 3.1.2 Y y n y n t n KLD y[i] = argmin i y[i] t[i] D (k i ) (8) k i y[i] y n t n t n t n t n = ξh n + ψ1 n (9) h n 4 1 ξ 4 ψ 0 1 1 n 1 n (9) ξ ψ BPM t n y n G(y n, t n ) = n (y[i] t[i]) 2 (10) i=1 (8) (10) 1: ξ ψ 3

ξ = L s f s h[n], ψ = 0 (11) f s L s 2: ξ ψ (9) t n (8) y n 3: (10) ξψ 2 ξ = yt h n ψ1 T n h n h T n h n (12) ψ = yt 1 n ξ1 T n h n n (13) y n t n 3.2 1 t[1]... t[1] M w 0 1 t[2]... t[2] M w 1 f(t n ) =......... = Bw 1 t[n]... t[n] M w M 1 (14) y n = t n + Bw + e n (15) z n = y n t n 15 z n = Bw + e n. (16) (14) w (16) w w w 1 L 2 [11] R γ (w) = (z n Bw) T (z n Bw) + γw T 1 w 1 (17) M = 10 γ = 0.1 w 2 2 Fig. 2 KLD z n A estimation result of a performance intention function. Top figure shows musical score, second figure shows spectrogram of performance sound (linear frequency), third figure shows KLD (blue line) and candidate set of onset times (red circle), and bottom figure shows deviances z n (blue circle) and an estimated performance intention function (red line). 4. 4.1 α (3) [12] Griffin [13] α[i] α[i] 4.2 3 4

1 Table 1 Recording conditions. 3 Roland, EDIROL R-09 Roland, UA-25EX 48kHz, 16bit 2 Table 2 Used phrases in this evaluations. 3 RMSE Fig. 3 RMSE of onset detection. A. Dvorak, Symphony No. 8 1. 1 244-250 1st Violin R.Wagner, Tannhauser Act.II Grand March 2. 40-44 1st Violin 3. 64-68 1st Violin A. Dvorak, Symphony No. 8 1. 1 1-6 2. 1 165-169 3. 4 26-33 1. LUNKHEAD ENTRANCE 5-12 2. MONKEY MAJIK 52-56 3. Thousand Dreams 2-9 30 2 3 IC 48kHz bit 16bit 1 2 4.2.1 RMSE KLDKLD PHA[8]COM[9] 3.1.2 3 RMSE KLD RMSE KLD RMSE [10] 50ms [10] 4 MAE Fig. 4 MAE of values of performance intention functions. Pitched-Percussive [10] 3.1.2 3 4.2.2 (MAE) MAE = 1 6 3 2 1 n s n s s=1 q=1 i=1 x p s[i] x a s,q[i] (18) s q n s s x p s[i] i x a s,q[i] q i 4 MAE MAE 15ms 5

Fig. 5 5 Result of the subjective evaluation. 70ms[14] 4.2.3 5 5 ORGPRO mean opinion score (MOS) 1 5 MOS 5 Dunnett [15] 5% 5. 15ms MOS [1] Hirokazu Kameoka, et.al., Complex NMF: A New Sparse Representation for Acoustic Signals, In Proc. ICASSP 2009, pp. 3437-3440, Apr. 2009. [2] Katsutoshi Itoyama, et.al., Integration and Adaptation of Harmonic and Inharmonic Models for Separating Polyphonic Musical Signals, in Proc. ICASSP 2007, pp. 57-60, April 2007. [3],, GMM NMF,, Vol. 52, pp. 3839-3852, 2011. [4],,,, Vol. 50, pp. 1054-1066, 2009. [5] Yasunori Ohishi, et.al., A Stochastic Model of Singing Voice F0 Contours for Characterizing Expressive Dynamic Components, In Proc. INTERSPEECH 2012, Sep. 2012. [6] Yuma Koizumi, et.al., Performance expression synthesis for bowed-string instruments using Expression Mark Functions Proceedings of Meetings on Acoustics (POMA). Vol. 15, pp. 035003, Nov. 2012. [7],, 1 66(5), pp. 203-212, 2010. [8] Bello, J.P., et.al., Phase-based note onset detection for music signals, In Proc. ICASSP 2003, vol.5, 6-10 Apr. 2003. [9] Bello, J.P., et.al., On the use of phase and energy for musical onset detection in the complex domain, Signal Processing Letters, IEEE, vol.11, no.6, pp. 553-556, June 2004 [10] Bello, J.P., et.al., A Tutorial on Onset Detection in Music Signals, Speech and Audio Processing, IEEE Transactions on, vol.13, no.5, pp.1035-1047, Sept. 2005 [11] Arthur E. Hoerl, et.al., Ridge Regression: Biased Estimation for Nonorthogonal Problems, Technometrics, Vol.12, No.1., pp.55-67, Feb., 1970 [12],,,, vol.39, no.6, pp.447-452, Oct. 2009. [13] D.W.Griffin, et.al., Signal estimation from modified short-time Fourier transform In Proc. ICASSP 1984, vol.32, no.2, pp.236-243, Apr. 1984. [14] Simon Dixon, Automatic extraction of tempo and beat from expressive performances, Journal of New Music Research, vol. 30, pp. 39-58, 2001. [15] C.W.Dunnett, New Tables for Multiple Comparisons with a Control, Biometrics, Vol.20, No.3, pp.482-491, 1964. 6