39 2 ( ) Vol. 39 No. 2 2007 3 JOURNAL OF SICHUAN UN IVERSITY ( ENGINEER ING SC IENCE ED ITION) Mar. 2007 : 100923087 (2007) 0220116205 1, 2, 1, 1, 3, 4, 1, 5 (1., 610065; 2., 610065; 3., 610041; 4., 610064; 5., 610075) :,,, 27% : ; ;; : TP311 : A Knowledge Induction Ba sed on Genera liza tion of M ulti2benchmark A ttr ibute L IU Q i2hong 1, 2, TAN G Chang2jie 1, L I Chuan 1, ZHU Jun 3, L IU Q i2w ei 4, Q IAO Shao2jie 1, J IAN G Yong2guang 5 (1. School of Computer Sci., Sichuan Univ., Chengdu 610065, China; 2. Electrical Eng. and Info., Sichuan Univ., Chengdu 610065, China; 3. B irth D isfigurement Surveillance Center of China, Sichuan Univ., Chengdu 610041, China; 4. School of Public Administration, Sichuan Univ., Chengdu 610064, China; 5. Chengdu Univ. of Traditional Chinese Medicine, Chengdu 610016, China) Abstract: Through analyzing the disfigurem ent of basic attribute2oriented induction arithmetic, a new op tim ized technic and arithmetic based on generalizing hiberarchy of relevancy attribute2oriented was p roposed for Chinese medicine p rescrip tion data m ining. The efficiency of induction and the accuracy of discovery know ledge by the per2 tinency of relevancy2attribute2threshold and benchmark2attribute2threshold were enhanced, and generalizing hiberar2 chy and trees for relevancy2attribute based on Chinese medical know ledge were established. A new system about Chinese medicine p rescrip tion know ledge discovery and analysis was imp lemented. The effectivity of the new arith2 metic was p roved by four extensive experiments. The efficiency was heighten above 27% comparing w ith the con2 ventional arithm etic in the same condition. The speciality know ledge and rules in accordance w ith Chinese medicine theory can be found more rap idly by the new system. Key words: data m ining; relevancy2attribute2oriented induction; threshold of multi2benchmark2attribute ; know l2 edge discovery of Chinese herbal m edicine : 2006-08 - 31 : ( 60473071; 90409007 ) ; ( 2006Z01-027 ) ; (2006BA I05A001) :(1964 - ),,,. :.
2, : 117,,,, [ 1-2 ] AO I [ 3-4 ],, / T_AO I MT _AO I : 1) ; 2) ; 3),, ; 4), 27%, 2),, [ 7-8 ],,,,,, 1 ( 1) 1,, : 1), any; 2) () ; 1 [ 5-6 ],: 1), [ 5-6 ] ; 1 Tab. 1 The pr im ord ia l da ta of Ch inese m ed ic ine prescr iption 1 F ig. 1 Concept tree about a ttr ibute of m ed icam en t eff icacy 101,,,, 41. 25 102,, ( ) 27. 5 103 104,,,,,,13. 75, 110
118 ( ) 39 3) ; 4), ;, 2 /, : 1 T 6 (V, E, I, O, L, H),, V = { v 1, v 2,, v n }, E = { e ij v i, v j V }, I, O, L, H 12 2 F ig. 2 Concept tree about a ttr ibute of dyna sty, k, 0, 1,, n - 1, k k, H, 4, 4,,, 2 ( ) 1, 2,, N, k N,H 3 (Concepts, 4, k),, Concepts, 4 Concepts, k,,,,, : 3 R ( A 1, A 2,, A n, B m ), B m A n, A n, B m ; A n,b m 4 ( ) R ( A 1, A 2,, A n, B m ) A n, B m D i, T i A n, T j B m a i, a i [0, 1 ]; C i, C i [1, 20 ]; f( f = 0, n < 0; f = n, 0 < n < 1; f = 1, n > 1) 1) T j = 6 n i =1 f ( a i T i ) + C i, B m, ; 2) T j = f ( a i T i ) + C i, B m, 54 M T = ( l, f l, T i, T j ),, l = ( c 1, c 2,, c n ) R, R, f l ( ) = { ( a 1, a 2,, a m ) a i Φ c i, i = 1, 2,, m } s, Φ H ; T i, T j,, [ 8 ],, 3, T_AO I( Threshold of single benchmark AO I) M T_AO I (M ulti2benchmark2attribute2threshold AO I) : 1) ; 2), ; 3) ; 4),
2, : 119 ; 5), ; 6 ),,, : 1 / / MT_ AO I(M ulti2benchm ark2threshold AO I) : 1) DB; 2) DMQuery; 3)A i T i : R Step 1. M Get_ p rescrip tion_data ( ) / / M ; Step 2. 1 ) T j = f( 6 n a i T i ) + C i a i, b j c i,t j i = 1 ; 2) CT L i R i, ; Step 3. Prepare_for_generalition ( ) / /M : 1) M, A i [A i ], [A i ] T i, CT H i ; 2) T j, CT H i, Step 4. R Generalition ( ) / / M, D, M,, R i,r, R 2 / / T_ AO I( Threshold of single benchmark AO I),,,, 3 : 1 ) R ; 2 ) DMQuery; 3) : Step 1. R Sorting ( ) / /R R,,, Step 2. 4 : Intel Pentinum 4 CPU: 1. 4 GHz, 1 G; VC + + 6. 0 MS SQLSERVER2000 W INGDOW S2000; : DB DB 22,,,,,,, 116 Formula, 201 Syndrome, 845Symp tom 3 T j = a i T i + C i,2 ( ),, 3 Fig. 3 Part of rules and results of experiments and discovery
120 ( ) 39 : MT_AO I,, AO I 27%,, 5,,,, AO I :, 27%,,,, GEP [ 11-12 ] : [ 1 ]Han J, Fu Y. Exp loration of the power of attribute oriented induction in data m ining [ C ] / / Fayyad U, Shap iro G P, Smyth P, et al. Advances in knowledge discover and data m ining. AAA IPM IT Press, 1996: 399-421. [ 2 ] Kamber M, W instone L, Gong W, et al. Generalization and decision tree induction : Efficient classification in data m ining[ C ] / / Proceeding of 1997 IntWorkshop on Research Issues in Data Engineering (R IDE 97). B irm ingham, Eng2 land, 1997. [ 3 ] Sun J P, B iw Y. D irected acyclic concep t graph based at2 tribute oriented induction [ R ]. Fort Lauderdale : Graduate School of Computer and Information Sciences, Nova South2 eastern University, 2001: 32-45. [ 4 ] Zhou X, Sha C F, Zhu Y Y, et al. Interest measure another threshold in association rules[ J ]. Journal of Computer Re2 search and Development, 2000, 37 (5) : 627-633. [ 5 ]W u Xiaorong, Xie L ihong. A ttribute2oriented induction and concep tual clustering[ J ]. Computer Engineering, 2003, 29 (5) : 92-123. [,. [ J ]., 2003, 29 (5) : 92-123. ] [ 6 ]Chen Yan, Zhao Hai, Zhang Degan, et al. Study on classifi2 cation method induct ion based on rough set theory used in knowledge rule m ining[ J ]. M ini2m icro System s, 2005, 126 (3) : 462-465. [,,,. [ J ]., 2005, 126 (3) : 462-465. ] [ 7 ] Tian Yangge, B ian Fuling. Regional division algorithm based on concep tual clustering and attribute2oriented induction [ J ]. Geomatics and Information Science ofw uhan Universi2 ty, 2005, 30 (1) : 86-88. [,. [ J ]. :, 2005, 30 (1) : 86-88. ] [ 8 ] Sun Huamei, Guo Maozu, J iao J ie, et al. New concep t hier2 archy op tim ization method in attribute2oriented induction [ J ]. Journal ofm anagement Sciences in China, 2004, 7 (1) : 65-72. [,,,. [ J ]., 2004, 7 (1) : 65-72. ] [ 9 ] Peng J ing, Tang Changjie, Zeng Tao, et al. A Chinese tra2 ditional medicine p rescrip tion effect reduction algorithm based on artificial neural network and p roperty distance ma2 trix[ J ]. Journal of Sichuan University : Engineering Science Edition, 2006, 38 (1) : 92-97. [,,,. [ J ]. :, 2006, 38 ( 1) : 92-97. ] [ 10 ]L iu Q ihong, Tang Changjie, Hu J ianjun. Gene exp ression p rogramm ing based on diversity2guided grading evolution [ J ]. Journal of Sichuan University : Engineering Science E2 dition, 2006, 38 (6 ) : 108-113. [,,,. [ J ]. :, 2006, 38 (6) : 108-113. ] ( )