a) Higher-Order Correlation Analysis of Pitch Fluctuations in Sustained Normal Vowels by the Method of Surrogate Data Isao TOKUDA a), Takaya MIYANO, and Kazuyuki AIHARA 2 3 3 2 3 1. [1] [3] Department of Computer Science and Systems Engineering, Muroran Institute of Technology, Muroran-shi, 050 0071 Japan COE Center for Promotion of the COE Program, Ritsumeikan University, Kusatsu-shi, 525 8577 Japan Department of Mathematical Informatics, The University of Tokyo, Bunkyo-ku, Tokyo, 113 8656 Japan a) E-mail: tokuda@agnld.uni-potsdam.de AR [2] [4] [5] [6] [10] [11] [12] [14] [15] [3] [2] [21], [22] 3 [18], [19] Schreiber [23] A Vol. J87 A No. 3 pp. 355 363 2004 3 355
2004/3 Vol. J87 A No. 3 Schreiber 2. 3. 4. 5. 1 4 oo, ao, ka, yo /a/ Table 1 Data length, number of extracted pitches, mean pitch period, and coefficient of variation of a sustained vowel /a/ recorded from 4 subjects (oo, ao, ka, yo). oo 800 msec 137 5.86 msec 0.873 % ao 450 msec 71 6.26 msec 0.778 % ka 900 msec 223 4.06 msec 0.773 % yo 900 msec 254 3.56 msec 1.10 % 2. 2. 1 2 oo, ao 2 ka, yo /a/ 4 10 khz 16 22.05 khz 1 2. 2 [20] {T i : i =1, 2,.., N} (1) 1 4 oo, ao, ka, yo N = 137, 71, 223, 254 2 (a) oo coefficient of variation [16] α = 100 T 1 N N (T i T ) 2 [%] (2) i=1 1 {T 1,T 2,...} Fig.1 Pitch extraction via the zero-crossing method and generation of the pitch period sequence {T 1,T 2,...}. T = 1 N Ti 4 N i=1 oo, ao, ka, yo 1 [17] 1.05±0.40% 4 4 3. [18], [19] method of surrogate [21], [22] statistical hypothesis testing 356
(a) (b) (c) (d) (e) 2 (a) oo {T 1,T 2,...,T 137} (b) (a) (c) 2 (d) 3 (e) 4 Fig.2 (a) Original pitch period sequence {T 1,T 2,..., T 137} extracted from the subject oo.(b) Surrogate data generated by random shuffling of the original data (a).(c) Surrogate data that preserves the 2nd order correlation function.(d) Surrogate data that preserves the 3rd order correlation function.(e) Surrogate data that preserves the 4th order correlation function. FT [21] Schreiber [23] {T i} 357
2004/3 Vol. J87 A No. 3 3. 1 {T i} T σ τ µ } rst (τ ) {{} m times 1 = N (m 1)τ N (m 1)τ t=1 1 σ r+s+t+ (T i T ) r (T i+τ T ) s (T i+2τ T ) t (3) 2 κ 2(τ ) κ 2(τ )=µ 11(τ ) = 1 σ 1 2 N τ N τ i=1 (T i T ) (T i+τ T ) (4) 2 κ 2(τ )=µ 11(τ ) (5) 3 4 [18], [24] 3 d κ d d (C1) [21] {T i} {T i} (C2) {T i} {T i} d κ k, κ k (k =2,...,d) E({T })= d k=2 τ {κ k (τ ) κ k(τ )} 2 (8) SA (8) {T } SA (T i,t j)(i j) {...,T i,...,t j,...} {..., T j,...,t i,...} SA l t =1/log(l) E(T i T j) 1/{1 + exp( E/t)} oo {T i} 4 {T i} 3 l 2.5 10 4 SA κ 3(τ )=µ 111(τ ) (6) 4 κ 4(τ )=µ 1111(τ ) µ 0011(τ ) µ 1100(τ ) µ 1010(τ ) µ 0101(τ ) µ 1001(τ ) µ 0110(τ ) (7) 4 [18], [24] 3. 2 {T 1,T 2,...} {T 1,T 2,...} 3 (8) Fig.3 Process of minimizing the error function (8) by the simulated annealing method. 358
2 (e) 2 (a) 4 2 3 4 4. (a) (b) 4. 1 LPC [20] {T i} 5 (S1) (S2) N (c) 4 oo 4 (a) 2 κ 2(τ ) (b) 3 κ 3(τ ) (c) 4 κ 4(τ ) Fig.4 Higher order correlation functions of the subject oo s original pitch sequence (bold line) and the surrogate data that preserves the 4th order correlation function (crosses).(a) The 2nd order correlation functions κ 2(τ ), (b) The 3rd order correlation functions κ 3(τ ), (c) The 4th order correlation functions κ 4(τ ). Fig.5 5 Speech synthesis based on waveform editting. 359
2004/3 Vol. J87 A No. 3 {T 1,T 2,...,T N} (S3) [4] (S3) (S2) [25] 4. 2 5 (a) (b) (c) 2 (d) 3 (e) 4 5 [26] 20 4 oo, ao, yo, ka 5 6 4 2 2 [3] [2] 3 AR ao 4 ao 1 {T 1,...,T 71} {T 1,...,T 71} 2 {T 1,...,T 71,T 1,...,T 71}, {T 1,...,T 71,T 1,...,T 71} 7 5. 1 360
(a) (b) (c) (d) 6 2 3 4 (a) oo (b) ao (c) yo (d) ka Fig.6 Measurement of the naturalness of the synthesized sounds. From left side, random shuffed surrogate data, 2nd order surrogate data, 3rd order surrogate data, 4th order surrogate data, and the original data.(a) subject oo, (b) subject ao, (c) subject yo, (d) subject ka. 7 2 ao Fig.7 Measurement of the naturalness of the synthesized sounds from the subject ao in the case when the pitch sequence was repeated twice. 2 2 [3] [2] 3 3 AR 2 [6] [10] [27] 361
2004/3 Vol. J87 A No. 3 3 I II LPC III AR [1] L.Dolansky and P.Tjernlund, On certain irregularities of voiced-speech waveforms, IEEE Trans.Audio Electroacoust., vol.16, pp.51 56, 1968. [2] vol.47, no.8, pp.539 544, 1991. [3] vol.47, no.12, pp.928 934, 1991. [4] vol.47, no.12, pp.903 910, 1991. [5] N.Aoki and T.Ifukube, Analysis and perception of spectral 1/f characteristics of amplitude and period fluctuations in normal sustained vowels, J.Acoust. Soc. Am., vol.106, no.1, pp.423 433, 1999. [6] H.Herzel, D.Berry, I.R.Titze, and M.Saleh, Analysis of vocal disorders with method from nonlinear dynamics, J. Speech Hear. Res., vol.37, pp.1008 1019, 1994. [7] W.Mende, H.Herzel, and I.R.Titze, Bifurcations and chaos in newborn cries, Phys.Lett.A, vol.145, pp.418 424, 1990. [8] S.S. Narayanan and A.A. Alwan, A nonlinear dynamical systems analysis of fricative consonants, J. Acoust. Soc. Am., vol.97, no.4, pp.2511 2524, 1995. [9] A.Behrman, Global and local dimensions of vocal dynamics, J. Acoust. Soc. Am., vol.105, no.1, pp.432 443, 1999. [10] M.Banbrook, S.McLaughlin, and I.Mann, Speech characterization and synthesis by nonlinear methods, IEEE Trans.Speech Audio.Process.vol.7, no.1, pp.1 17, 1999. [11] M.Sato, K.Joe, and T.Hirahara, APOLONN brings us to the real world, Proc.Int.Joint Conf. Neural Networks, vol.1, pp.581 587, 1990. [12] I.Tokuda, R.Tokunaga, and K.Aihara, A simple geometrical structure underlying speech signals of the Japanese vowel /a/, Int. J. Bif. Chaos, vol.6, no.1, pp.149 160, 1996. [13] I.Tokuda, T.Miyano, and K.Aihara, Surrogate analysis for detecting nonlinear dynamics in normal vowels, J. Acoust. Soc. Am., vol.110, no.6, pp.3207 3217, 2001. [14] NLP99-152, 2000. [15] T.Ikeguchi and K.Aihara, Estimating correlation dimensions of biological time series with a reliable method, J. Int. Fuzzy Sys., vol.5, no.1, pp.33 52, 1997. [16] N.B. Pinto and I.R. Titze, Unification of perturbation measures in speech signals, J.Acoust.Soc.Am., vol.87, pp.1278 1289, 1990. [17] R.C. Scherer, V.J. Vail, and C.G. Guo, Required number of tokens to determine representative voice perturbation values, J. Speech Hear. Res., vol.38, pp.1260 1269, 1995. [18] 2000. [19] 2002. [20] 1985. [21] J.Theiler, S.Eubank, A.Longtin, B.Galdrikian, and J.D. Farmer, Testing for nonlinearity in time series: The method of surrogate data, Physica D, vol.58, pp.77 94, 1992. [22] J.Theiler and D.Prichard, Constrained-realization Monte-Carlo method for hypothesis testing, Physica D, vol.94, pp.221 235, 1996. [23] T.Schreiber, Constrained randomization of time series data, Phys. Rev. Lett., vol.80, no.10, pp.2105 2108, 1998. [24] A.Stuart and J.K.Ord, Kendall s Advanced Theory of Statistics, Fifth Ed., Charles Griffin, 1987. [25] CAS2000-41, 2000. [26] 362
1991. [27] I.Tokuda, T.Riede, J.Neubauer, M.J.Owren, and H.Herzel, Nonlinear analysis of irregular animal vocalizations, J. Acoust. Soc. Am., vol.111, no.6, pp.2908 2919, 2002. 15 6 16 9 21 11 4 4 6 8 15 14 60 11 15 COE IEEE 52 57 ME INNS SMB 363