THE INSTITUTE O ELECTRONICS, INORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT O IEICE. 277 8562 5 1 5 113 3 7 3 1 6 817 17 8 E-mail: {hiran,wtgu,hirose,mine}@gavo.t.u-tokyo.ac.jp, goh@kawai.com ( ) 1) 2) 1) 2) 3) 4) ( ) Abstract Analysis of prosodic features in native and non-native Japanese using generation process model of fundamental frequency contours H. HIRANO, W. GU, K. HIROSE, N. MINEMATSU, and G. KAWAI Graduate School of rontier Sciences, University of Tokyo 5 1 5, Kashiwanoha, Kashiwa shi, Chiba, 277 8562 Japan Graduate School of Information Science and Technology, University of Tokyo 7 3 1, Hongo, Bunkyo ku, Tokyo, 113-33 Japan Institute of Language and Culture Studies, Hokkaido University Kita 17, Nishi 8, Kita ku, Sapporo, 6-817 Japan E-mail: {hiran,wtgu,hirose,mine}@gavo.t.u-tokyo.ac.jp, goh@kawai.com We compared patterns of the utterances by native and non-native (Mandarin Chinese) speakers of Japanese. Model-based analysis indicates that non-native speakers tend to transfer the prosodic characteristics of their native language into the utterances of Japanese. They show the following peculiarities as compared with native speakers: 1) a higher baseline frequency; 2) more frequent occurrences of phrase commands and hence shorter prosodic phrases; 3) a tendency of decomposition of a bunsetsu into multiple prosodic words, and hence redundant addition or incorrect location of accent commands; 4) occasional rapider falling (mostly at the end of utterance) and hence introduction of negative accent commands there. These observed peculiarities can be ascribed to interference by tonal and syllable-timed natures of Mandarin Chinese, difficulty in prosodic planning during speaking second language, and lack of adequate teaching of the skills for sentence intonation. Key words L2 learning of Japanese, sentence reading, patterns, generation process model of contours 1
1. 1. 1 25 (4.3) [sec].17.12 (.6) (.4) [sec] 5.1 3.59 (1.26) (1.17) 1.52 1.28 ( ) ( ) 5 2. 2. 1 3 1 1. 2 16bit) 12 [1] 1) 21 2)11 4 27 [2] [3] 2. 2 [4] 7 ( 2) ( ) 8 1 6 2. 3 P RAAT [8] 1ms 1. 3 Hirano [5] [6] 1) 2. 4 1 1 7 [sec].72.77 (.16) (.1) 4m 2 Sony D-38B DAT Sony TCD-D1 PRO 48kHz 3)555 1 3 ( 1 1 25 1 1 7 2) [7] [sec] 66 2 3 4 1 2
Pitch (Hz) 5 3 2 15 1 1 Pitch (Hz) 5 3 2 15 1 Non-native n=7 Native n=7 ( ) ( ) 7 { } 5 2 Pitch (Hz) 3 2 15 silb me ga ne no P ni a u P na o ya no a ni ni P i N do no ryo: ri o na ra i ni i ku sile 1 Time (s) 8.36633 Pitch (Hz) 5 3 2 15 silbme ga ne no ni a u P na o ya no a ni ni P i N do no ryo: ri o na ra i ni i ku sile 1 Time (s) 4.29423 2 Non-native speakers (Chinese) 3 Native speakers (Japanese) ( ) ( ) 1 1 2. / / / / 3 / 8 3. 2 7 (1,138)=13.37, p.1 3. 3. 1 1 1 5 { } 7 (Hz) [9] 4. 4. 1 ( ) ujisaki [9] 7 ( ) LHHLLL(L) 2 1 ( 4). 3
Ap T1 2 Aa T T12 T22 T11T21 T 4 13 23 T3 t T14T24 T t Gp(t) log e b Ga(t) logeb log e (t) 2 t b(hz) 2 5 b (Hz) Number of occurrences 8 6 4 64 JP_male JP_female 76 88 1 112 5 7 CH_male CH_female 124 136 148 16 172 184 IX log e (t) = log e b + A pi Gp(t T i ) i=1 JX + A aj {G a(t T 1j ) G a(t T 2j )} (1) j=1 3 b (Hz) 5 7 CH male CH female JP male JP female mean(hz) 84.5 157.3 78.1 131.7 SD 12.1 8.6 4.6 9.8 b I 14 J ( 1 7 ) A pi i ( ) A aj j ( ) T i i 1kHz 16bit T 1j j T 2j j [11] 1ms G p(t) (2) (2) (3) b 5 3 5 7 35 b = 3/s = 2/s =.9 b (131.7Hz) (157.3Hz) (3) b 78Hz b 1Hz 4. 2 [1] P RAAT [8] 8 < α 2 t exp( αt) (t > = ) G p(t) = (2) : (t < ) G a (t) (3) 3/s 2/s.9 8 < min[1 (1 + β t ) exp( βt), γ] (t > = ) G a (t) = (3) : (t < ) 4. 3 b 4
4 5 5 7 CH male CH female JP male JP femal mean 4.69 4.83 2.69 3.4 SD 1.45 1.22.8.65 5 7 CH male CH female JP male JP female mean 9.69 1.71 6.54 7.83 SD 2.46 3.28 1.48 1.81 Number of occurrences 1 8 6 4 2 bunsetsu CH_female CH_male JP_female JP_male 1 2 3 4 5 6 7 Sentence ID Number of occurrences mora bunsetsu CH_female CH_male JP_female JP_male 3 2 1 1 2 3 4 5 6 7 Sentence ID 6 7 4 ( ) 5 7 7 4 ( bunsetsu ) 5 mora bunsetsu 4. 4 18 JP_male 4 6 7 16 JP_female 4 14 CH_male 12 CH_female 1 8 6 4 2.69 3.4 4.69 4.83 2 6 5 6.5 1 2 3 4 5 4. 5 Number of accent commands in a bunsetsu 8 5 7 Number of occurences 7 4. 6 1 2 8 4 5 3 /-en/ /-ai/.5 ( ) 5
o(t)36 [Hz] 27 18 b Ap.5. Aa.5. o(t)36 [Hz] 27 b 18 Ap.5. Aa.5. 9 a o mo ri no ri N go o ta be ru ma de i e no N wa be re i ri go ta ra na..5 1. 1.5 2. 2.5 3. 3.5 4. 4.5 5. 5.5..5 1. 1.5 2. 2.5 3. 3.5 4. 4.5 5. 5.5..5 1. 1.5 2. 2.5 3. 3.5 4. 4.5 5. 5.5 TIME [s] a o mo ri no ri N go o ta be ru ma de P i e no ri N go wa P ta be ra re na i..5 1. 1.5 2. 2.5 3. 3.5 4. 4.5 5. 5.5..5 1. 1.5 2. 2.5 3. 3.5 4. 4.5 5. 5.5..5 1. 1.5 2. 2.5 3. 3.5 4. 4.5 5. 5.5 TIME [s] ( ) ( ) 1 ( b ) (A p ) (A a) / / / / 7 1 1 5. ( ) 4 ( ) [1], 17 [12] http://www.jasso.go.jp/statistics/intl student/data5.html (accessed 15 Oct, 26) [2], 7 48,, pp.29-37, 1991 4. 7 [3] May 25. [4],,, D1 3, 1991 9 1 1 [5] Hirano, H. and Kawai, G., 25. Pitch patterns of intonational phrases and intonational phrase groups in native and 6 non-native speech, Proc. Interspeech25, Lisbon, Portugal, pp.761-764. [6] 6 4 SP25-188 pp.23-28 26. 4 [7] Hirano, H., Gu, W., and Hirose, K., Model-based Analysis 7 of Contours of Japanese Sentences Uttered by Chinese 1 1 Speakers The 7th Phonetic Conference of China (7th PCC) and International orum on Phonetic rontiers, to appear, 26. [8] Boersma, P. and Weenik, D., 26 Praat: doing phonetics by computer http://www.praat.org/ (accessed 15 Oct, 26). [9] ujisaki, H. and Hirose, K., Analysis of voice fundamental 2 frequency contours for declarative sentences of Japanese, J. Acous. Soc. Japan (E) 5 (4), pp. 233-242, 1984. [1],,, Vol.43, No.7, pp.2155-2168 22. [11] K.Hirose, H.ujisaki, and S.Seto, A scheme for pitch extraction of speech using autocorrelation function with frame / 2 1 length proportional to the time lag, Proc. IEEE ICASSP, Vol.1, pp.149-152, 1992. [12] ujisaki, H., Hirose, K., Hallé, and P., Lei, H., Analysis and modeling of tonal features in polysyllabic words and sentences of the Standard Chinese, Proc. ICSLP 199, Kobe, ( ) Japan, pp.841-844, 199. 6