40 3 Journal of South China University of Technology Vol 40 No 3 2012 3 Natural Science Edition March 2012 1000-565X 2012 03-0106-06 * 510640 MFCC K-L K-L MFCC K-L 46 61% 42 25% 39 68% 36 36% K-L TN912 3 doi 10 3969 /j issn 1000-565X 2012 03 017 SR PMC 1-2 7 CMN CMN 2 6% 8 3-5 Mel 1 3% 3 6 1 9 MFCC CMN RAS- TIMIT TA WM 2 0 6% 3 8-9 2011-08-14 * 60972132 61101160 9351064101000003 10451064101004651 2011ZM0029 1978- E-mail hejun_723@ 126 com 1980- E-mail eeyxli@ scut edu cn
3 107 k MFCC F i k = f 1k i f2k i frk i T frk i i r k F NSFT m F NSFT MFCC Mel F 1 1 F 1 2 F 1 n F 2 1 F 2 2 F 2 n F NSFT = 1 F m 1 F m m 2 Fn 1 2 CMN 1 /ɑ / /ɑ / 1 a 4 8 khz 6 0 khz 1 6 ~ 4 8 khz 0 ~ 1 6 khz 1 b 6 0 ~ 8 0 khz MFCC K-L Kullback-Leibler Divergence 10 MFCC GMM 1 1 1 MFCC MFCC MFCC NSFT s i i s i n MFCC 1 /ɑ / /ɑ / F i = F 1 i F i k F i n F i k i Fig 1 Spectra of abnormal speech /ɑ / and normal speech /ɑ /
108 40 1 6 ~ 4 8 khz 1 17 s 826 0 ~ 1 6 khz 15 ~ 20 s MFCC 1 2 b K-L 10 MFCC K-L 12 MFCC 3 600 11-12 K-L 3 ~ 5 s 826 MFCC 2 a MFCC K-L 1 1 6 9 10 11 12 5 7 8 1 826 NSFT K-L Table 1 K-L distances from one abnormal speech and 826 abnormal speeches to NSFT 1 2 3 4 5 6 7 8 9 10 11 12 2 5 15 0 1 8 5 0 7 5 1 0 12 5 11 0 18 0 12 0 8 0 95 0 826 6 5 6 0 6 5 6 4 5 0 11 0 2 5 4 0 12 0 11 0 18 0 13 0 2 2 1 K-L p x q x K-L D K-L p q = p x lg p x q x dx 2 12 MFCC d k K-L q k p k = N q k x lg q k x 4 Fig 2 12-order MFCC feature probability distribution of i = 1 p k x normal speech and one abnormal speech N 2 p x q x 2 D K-L p q = N i = 1 p x lg p x q x 3 F k F NSFT k MFCC p k F k s c MFCC k q k k MFCC F NSFT k
3 109 F NSFT K-L W i E 7 W k K-L D c K-L = d 1 K-L d k K-L d n K-L 5 d k K-L k MFCC 3 d k K-L k MFCC 863 k K-L MEEI W k K-L = a K-L + εδ x /d k K-L 6 δ x δ x = 1 x median Dc K-L { - 1 x < median D c K-L d E x i x j = M x im' m' = 1 槡 2 - x jm' 8 M SNR 37 ~ 55 db yep120 s c MFCC F c F c i A i F NSFT i B i A i j B i d j A i j B i d j A i j B i = min ( M' A i j - B i k 9 ) k = 1 M' B i 60 d i E A i B i = N' d j A i j B i 10 4 j = 1 N' A i PANSD 9 3600 F NSFT D c E = d 1 E d i E d n E 11 W i E = a E + εδ x /d i E 12 a E 12 PANSD PANSD 1 27 2 0 ~ 9 10 10 10 3 863 20 median D c K-L D c K-L a K-L 8 ~ 16 ε 4 3 2 7 1 362 F Nk = W k K-LF Ok 7 1 317 1 F Nk k MFCC 1253 F Ok k PANSD 2010 3 2011 8 2 2 17 m' 9 8 20 ~ 35 Toplux TVP208 22 05 khz 16 3 ~ 5 5 ~ 20 cm 20 ~ 25 min PANSD 700 min PANSD 400 3 ~ 4 s 7 557 d i E i F NSFT 826 i i 15 ~ 20 s 1 ~ 2 min GMM WAV Cooledit Pro 2 0 16 khz
110 40 16 32 ms 16 ms 24 2 MFCC 12 2 826 1 394 Table 2 Comparison of speaker recognition rates for abnormal speech % 2 154 2 ~ 3 min 9 67 26 10 38 11 87 39 68 3 66 24 10 39 9 71 36 36 278 K-L-W E-W 83 77 77 91 17 53 10 38 10 07 10 43 46 61 42 25 12 MFCC 2 3 K-L-W E-W 9 1 K-L 2 3 9 3 Fig 3 Flowchart of the proposed algorithm 4 K-L-W E-W 9 9 4 4 1 8 ε = 0 5 K-L-W 46 61% 10 25% E-W 9 4 36% 6 93% K-L-W E-W E-W 98 56% 9 98 54% 98 38% K-L 98 02% K-L-W E-W 5 4 Fig 4 Influence of weighting parameter on speaker recognition rate for abnormal speech K-L GMM K-L-W E-W K-L-W E-W GMM K-L
3 111 46 61% 42 25% Processing Magazine 2010 27 1 120-123 6 Togneri R Pullella D An overview of speaker identification accuracy and robustness issues J Circuits and 39 68% 36 36% Systems Magazine 2011 11 2 23-61 1 Rashid R A Mahalin N H Sarijari M A et al Security system using biometric technology design and implementation of voice recognition system C Proceedings of International Conference on Computer and Communication Engineering Kuala Lumpur IEEE 2008 898-902 2 J 2009 37 9 47-51 Yang Ji-cheng He Qian-hua Pan Wei-qiang Modified BIC algorithm of speaker change detection J Journal of South China University of Technology Natural Science Edition 2009 37 9 47-51 3 J J Annals of Mathematical Statistics 1951 30 3 79-86 2003 31 3 411-418 Zhang Lei Han Ji-qing Wang Cheng-fa Research progress of stress speech processing J Acta Electronic Sinica 2003 31 3 411-418 4 Alpan A Maryn Y Kacha A et al Multi-band dysperiodicity analyses of disordered connected speech J Speech Communication 2011 53 1 131-141 5 Maciel C D Pereira J C Stewart D Identifying healthy and pathologically affected voice signals J IEEE Signal 7 Garner Philip N Cepstral normalisation and the signal to noise ratio spectrum in automatic speech recognition J Speech Communication 2011 53 8 991-1001 8 Yang Hong-wu Liu Ya-li Huang De-zhi Speaker recognition based on beighted Mel-cepstrum C Proceedings of the Fourth International Conference on Computer Sciences and Convergence Information Technology Seoul BIC IEEE 2009 200-203 9 Weng Zufeng Li Lin Guo Donghui Speaker recognition using weighted dynamic MFCC based on GMM C Proceedings of International Conference on Anti-Counterfeiting Security and Identification in Communication Chendu IEEE 2010 285-288 10 Kullback S Leibler R On information and sufficiency 11 You Chang Huai Lee Kong Aik Li Haizhou GMM-SVM kernel with a bhattacharyya-based distance for speaker recognition J IEEE Transactions on Audio Speech and Language Processing 2010 18 6 1300-1312 12 Ferrante A Ramponi F Ticozzi F On the convergence of an efficient algorithm for kullback-leibler approximation of spectral densities J IEEE Transactions on Automatic Control 2011 56 3 506-515 Speaker Recognition Algorithm for Abnormal Speech Based on Abnormal Feature Weighting He Jun Li Yan-xiong He Qian-hua Li Wei School of Electronic and Information Engineering South China University of Technology Guangzhou 510640 Guangdong China Abstract As the commonly-used weighting algorithm is inefficient in tracking the abnormal feature of abnormal speech a speaker recognition algorithm for abnormal speech is proposed based on the abnormal feature weighting In this algorithm first a feature template of normal speech is established by computing the probability distribution of MFCC features of each order in a large number of normal speech samples Then the K-L distance and the Euclidean distance are used to measure the differences between a given test speech and the normal speech templates and to further determine the K-L and the Euclidean weighting factors Finally the two weighting factors are used to weight the MFCC features of the test speech and the weighted MFCC features are input in the Gaussian mixture model for the speaker recognition with abnormal speech Experimental results show that the global recognition rates of the speaker recognition algorithms based on the K-L weighting and the Euclidean weighting are respectively 46 61% and 42 25% while those of the algorithms with and without the weighting of speaker recognition contribution of each order feature are respectively only 39 68% and 36 36% Key words abnormal speech speaker recognition abnormal feature weighting K-L distance weighting factor