A Vocabulary-Free Infinity-Gram Model for Chord Progression Analysis

1 1 n n n n n n A Vocabulary-Free Infinity-Gram Model for Chord Progression Analysis Kazuyoshi Yoshii 1 and Masataka Goto 1 This paper presents a novel nonparametric Bayesian n-gram model as a statistical language model for symbolic chord sequences. Standard n-gram models based on heuristic smoothing have three fundamental problems that they have no theoretical foundation, that the value of n is fixed uniquely, and that a vocabulary of chord types is defined in an arbitrary way. To solve these problems, we propose a vocabulary-free infinity-gram model. It accepts any combinations of notes as chord types and allows each chord appearing in a sequence to have an unbounded and variable-length context. Our experiments showed that the perplexity obtained by the proposed model is significantly lower than that obtained by the state-of-the-art models. 1 (AIST) 1. 1) 2) 3) 6) 5),6) 7) 9) n 1),2),4),6) 9) n n n n 1 n 1 n 1 n n n 0 n 10) n 1) n 2) n 3) ( major, minor, augmented, diminished, seventh, ninth,, ) 3) 1 c 2011 Information Processing Society of Japan

1 G:7 G:7 P() P( φ) P( G:7) P( G:7) n n n n n 11) 14) 1 n 1) 2) 3) 2 3 n 4 5 6 1 Major maj 100010010000 Minor min 100100010000 Diminished dim 100100100000 Augmented aug 100010001000 Major Seventh maj7 100010010001 Minor Seventh min7 100100010010 Seventh 7 100010010010 Dim. Seventh dim7 100100100100 Half Dim. Seventh hdim7 100100100010 Min. (Maj. Seventh) minmaj7 100100010001 Major Sixth maj6 100010010100 Minor Sixth min6 100100010100 Ninth 9 101010010010 Major Ninth maj9 101010010001 Minor Ninth min9 101100010010 Suspended Second sus2 101000010000 Suspended Fourth sus4 100001010000 2. 1 2 2.1 Harte 5) 17 12 1),4) C# Db 1 C major Gb diminished seventh F#:dim7 N 205 (17 12 + 1) 1 12 {C,C#,D,D#,E,F,F#,G,G#,A,A#,B} 2 c 2011 Information Processing Society of Japan

2 2.2 n 12 C major C,E,G C:100010010000 D major D:001000100100 D:100010010000 1 C major F C:100011010000 N 49153 2 12 12 + 1 4.1 3. n 2 n 3 n 3.1 W V 205 49153 w W n 1 u W n 1 X M x 1 x M x m W (1 m M) 1 n n x m n 1 P (x m x 1 x m 1 )=P (x m x m (n 1) x m 1 ) (1) X u w P u (w X) X u w c uw P u (w X) Pu ML (w X) = c uw (2) c u ( ) c u = w c uw n uw X c uw =0 0 3.2 Kneser-Ney n Kneser-Ney (KN) 15) n 10) c uw 1 d u P u (w X) d u u Interpolated KN (I-KN) w n n 1 P IKN u (w X) = max(c uw d u, 0) c u + d u t u c u P IKN π(u)(w X) (3) t u X u W π(u) u. P IKN π(u) (w) w n 1 (3) π(u) u n w 1/V d 0,,d n 1 15) 3.3 Pitman-Yor Teh 11) Pitman-Yor (Hierarchical Pitman-Yor Language Model: HPYLM) n 3 c 2011 Information Processing Society of Japan

I-KN HPYLM HPYLM I-KN 2 3.3.1 Pitman-Yor Pitman-Yor (PY) 16) PY (DP) DP W PY W n d θ G 0 PY G PY(d, θ, G 0 ) (4) d θ G 0 PY G θ G 0 G HPYLM PY W G φ φ 0 G φ (w) w 1 u G u G φ G φ G u G φ PY G u PY(d 1,θ 1,G φ ) d 1 θ 1 1 n 1 n G u G π(u) PY G u PY(d u,θ u,g π(u) ) (5) d u θ u u n 1 G π(u) n 1 G π(u) d π(u) θ π(u) n 2 G π(π(u)) PY G φ G φ PY(d 0,θ 0,G 0 ) (6) G 0, 3 φ D:min D:min G:7 Pitman-Yor CRF: G 0 (w) =1/V. G 0 n HPYLM n 1 3 n =3 3.3.2 HPYLM X X x m (Chinese Restaurant Franchise: CRF) 11) CRF X M W V 4 c 2011 Information Processing Society of Japan

x 1 x M m 1 x 1 x m 1 m x m x m n 1 u = x m (n 1) x m 1 3 u w t uw t u w k (1 k t u ) c uwk k w c uwk =0 u x m (i) c uwk d u k (1 k t u ) (ii) d u t u + θ u k = t u +1(i) x m k w x m m w c uwk (ii) k w π(u) π(u) x m w u k w x m w x m m w t uw c uw k π(u) π(π(u)) φ φ G 0 S u w Pu HPY (w S) = c uw d u t uw c u + θ u + d u t u + θ u P c u + θ π(u)(w S) HPY (7) u (7) P HPY π(u) (w S) (7) u π(u) c uwk =0 t uw =0 M X (7) (3) θ u =0 t uw =1 HPYLM I-KN m =1: M xm n 1 4 i = 1: m =1: M xm xm n 1 Pitman-Yor 3.3.3 u w P u (w X) X S (7) P (S X) Pu HPY (w S) Pu HPY (w X) = Pu HPY (w S)P (S X) (8) S (MCMC) Pu HPY (w X) 1 L Pu HPY (w S l ) (9) L i=l L p(s X) l 4 CRF (Exchangeability) S p(s X) CRF x m = w x m w w (7) 5 c 2011 Information Processing Society of Japan

(MCMC) L d 0,,d n 1 θ 0,,θ n 1 0 <d u < 1 0 <θ u 11) 3.4 Pitman-Yor HPYLM n 1 n 1 12) Pitman-Yor (Variable-order Pitman-Yor Language Model: VPYLM) VPYLM x m n z m z m z m n n 2 3.4.1 VPYLM x m n z m n (Stick-Breaking Process: SBP) 17) z m SBP x m φ x m 1 x m 2 i 1(1 i ) u i 1 η ui 1 1 η ui 1 n 1 z m = n n 1 P u (n η) =η un 1 (1 η ui 1 ) (10) i=1 n u η u 0 <η u < 1 α β p(η) = Beta(η u α, β) (11) u tree 3.4.2 VPYLM HPYLM X m x m z m (10) z m x m x m 3.3.2 CRF HPYLM n 1 VPYLM c uwk S η u n 1 w Pu VPY (w, n S, η) =Pu VPY (w n, S)P u (n η) (12) Pu VPY (w n, S) HPYLM (7) P u (n η) (10) (12) n η Pu VPY (w S) = Pu VPY (w n, S)P u (n S) (13) n n P u (n S)= Pu (n η)p(η S)dη (10) (11) n 1 a un 1 + α b ui 1 + β P u (n S) = (14) a un 1 + b un 1 + α + β a i=1 ui 1 + b ui 1 + α + β a ui 1 b ui 1 S u i 1 6 c 2011 Information Processing Society of Japan

3.4.3 u w 4 HPYLM 3.3.3 4 x m z m x m x m = w z m = n P u (n S,w) P u (w, n S) =P VPY u (w n, S)P u (n S) (15) 3.5 Pitman-Yor n 13),14) G 0 Pitman-Yor (Nested Pitman-Yor Language Model: NPYLM) G 0 (w) =1/V V G 0 (w) 0 VPYLM VPYLM CRF w G 0 (w) 2 4. NPYLM 14) 4.1 n NPYLM NPYLM G 0 VPYLM G 0 2.2 12 V = 49153 n G 0 2.2 w w 0 :w 1 w 12 w 0 w =N w 0 =N w 0 13 G 0 (w) =p(w π, τ )=π w0 12 i=1 τ w i i (1 τ i ) 1 w i (16) π = {π C,π C#,,π B,π N } 13 12 N τ = {τ 1,,τ 12 } w =N G 0 (w)=π N π τ p(π, τ )=Dir(π a 0 ) 12 i=1 Beta(τ i b 0,c 0 ) (17) 13 a 0, b 0 c 0. 4.2 VPYLM u w 3.4.3 G 0 (w) (16) π τ S π τ (16) (17) 7 c 2011 Information Processing Society of Japan

p(π, τ S) =Dir(π a 0 + n) 12 i=1 Beta(τ i b 0 + n i,c 0 + n i ) (18) 13 n n v (v {C, C#,, B, N}) n v φ v w (w 0 = v) n i i w (w i =1) n i i w (w i =0) VPYLM φ n v n i n i n v n i n i π τ (18) 5. n 2 5.1 Harte 5) 180 137 10,761 VPYLM 2.1 9 n Good-Turing (GT) Witten-Bell (WB) I-WB KN I-KN Modified KN (MKN) I-MKN HPYLM VPYLM GT WB 10) MKN KN (3) d u c uw 10) I- n 7 The SRI Language Modeling Toolkit (SRILM) 18) 2 205 n GT WB I-WB KN I-KN MKN I-MKN HPYLM VPYLM 1 16.8 15.6 15.7 15.8 16.0 15.6 15.7 15.8 (±0.03) 2 20.3 14.2 15.5 14.5 15.2 14.5 15.8 14.5 (±0.10) n: 3 23.5 15.4 18.5 16.1 16.0 16.1 16.3 16.0 (±0.18) 13.4 (±0.33) 4 25.5 16.8 22.5 16.8 17.7 16.5 15.5 13.9 (±0.25) 5 26.3 17.5 25.4 15.5 16.2 16.0 14.1 13.7 (±0.23) n: MAP 6 27.0 17.8 27.2 14.6 15.1 16.2 13.5 13.6 (±0.23) 12.9 (±0.35) 7 27.3 18.0 28.3 14.3 14.5 16.4 13.3 13.6 (±0.23) 8 27.3 18.0 28.8 14.1 14.2 16.6 13.2 13.6 (±0.22) n: ( ) 9 27.3 18.0 29.0 14.0 14.1 16.8 13.1 13.5 (±0.23) 11.9 (±0.22) 10 27.3 18.0 29.2 14.0 14.0 16.7 13.1 13.5 (±0.23) 2.2 9 HPYLM VPYLM 4 VF- VF-VPYLM VF-HPYLM n 10 5.2 2 VPYLM n n n HPYLM VPYLM VPYLM HPYLM 2 5 1 VPYLM I-KN HPYLM HPYLM I-KN I-MKN HPYLM HPYLM 11) 8 c 2011 Information Processing Society of Japan

Intro Verse A 5 n = 1 2 3 4 5 6 7 8 9 10 n = 1 2 3 4 5 6 7 8 9 10 7 6 Verse B 7 6 7 6 Chorus E:min LetItBe n Let It Be VPYLM n 5 VF-VPYLM n n 1 u w P u (w, n X)= P S u(w, n S)P (S X) 3 137 C:7 F:7 C:7 seventh 4 VF-VPYLM HPYLM VPYLM 4 3 (n 3 ) 0.701 C:7 F:7 C:7 0.682 B:maj 0.656 C:7 0.647 F:min 0.645 0.632 E:min C:7 0.623 7 D:min7 E:min7 0.627 B:maj 0.622 D:min7 G:sus4 0.620 D:min 49153 n GT WB I-WB KN I-KN MKN I-MKN 10 38.3 24.4 39.1 18.4 18.5 21.7 17.5 n HPYLM VF-HPYLM 10 18.0 (±0.29) 16.5 (±0.60) 6. n VPYLM VF-VPYLM 15.8 (±0.29) 14.6 (±0.55) n n n major minor (1967 ) 1989 9 c 2011 Information Processing Society of Japan

NPYLM (B) 23700184 1) Ogihara,M. and Li,T.: N-gram Chord Profiles For Composer Style Representation, in 9th International Conference of Music Information Retrieval (ISMIR), pp. 671 676 (2008). 2) Pérez-Sancho, C., Rizo, D. and nesta, J. M. I.: Genre Classification Using Chords and Stochastic Language Models, Connection Science, Vol.21, No. 2-3, pp. 145 159 (2009). 3) Paiement,J.-F., Eck,D. and Bengio,S.: A Probabilistic Model for Chord Progressions, in 6th International Conference of Music Information Retrieval (ISMIR), pp. 312 319 (2005). 4) Scholz, R., Vincent, E. and Bimbot, F.: Robust Modeling of Musical Chord Sequences Using Probabilistic N-grams, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 53 56 (2009). 5) Harte, C., Sandler, M., Abdallah, S. and Gómez,E.: Symbolic Representation of Musical Chords: A Proposed Syntax for Text Annotations, in 6th International Conference of Music Information Retrieval (ISMIR), pp. 66 71 (2005). 6),,,, No. 2010-MUS-85, pp. 1 8, (2010). 7) Cheng,H.-T., Yang,Y.-H., Lin,Y.-C., Liao,I.-B. and Chen,H.H.: Automatic Chord Recognition for Music Classification and Retrieval, in IEEE International Conference on Multimedia & Expo (ICME), pp. 1505 1508 (2008). 8),,,,,,, Vol.52, No.4, pp. 1803 1812 (2011). 9) Khadkevich, M. and Omologo, M.: Use of Hidden Markov Models and Factored Language Models for Automatic Chord Recognition, in 10th International Conference of Music Information Retrieval (ISMIR), pp. 561 566 (2009). 10) Chen,S. and Goodman,J.: An Empirical Study of Smoothing Techniques for Language Modeling, Technical Report TR-10-98, Computer Science Group, Harvard University (1998). 11) Teh,Y.W.: A Bayesian Interpretation of Interpolated Kneser-Ney, Technical Report TRA2/06, NUS School of Computing (2006). 12), Pitman-Yor n-gram,, Vol.48, No.12, pp. 4023 4032 (2007). 13),,,, No. 2009-NL-190, p. 49, (2009), http://chasen.org/daiti-m/paper/nl190segment.pdf. 14) Mochihashi,D., Yamada,T. and Ueda,N.: Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling, in ACL-IJCNLP, pp. 1000 1008 (2009). 15) Kneser, R. and Ney, H.: Improved Backing-Off for M-gram Language Modeling, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol.1, pp. 181 184 (1995). 16) Pitman, J. and Yor, M.: The Two-Parameter Poisson-Dirichlet Distribution Derived from a Stable Subordinator, Annals of Probability, Vol.25, No.2, pp. 855 900 (1997). 17) Sethuraman, J.: A Constructive Definition of Dirichlet Priors, Statistica Sinica, Vol.4, pp. 639 650 (1994). 18) Stolcke,A.: SRILM An Extensible Language Modeling Toolkit, in International Conference on Spoken Language Processing (ICSLP), pp. 901 904 (2002). 10 c 2011 Information Processing Society of Japan