38 5 ( ) Vol. 38 No. 5 2006 9 JOURNAL OF SICHUAN UN IVERSITY ( ENGINEER ING SC IENCE ED ITION) Sep t. 2006 : 100923087 (2006) 0520136207 1, 1, 1, 2, 1, 3, 1, 1 (1., 610065; 2., 723003; 3., 610031) :,,,;, ;,, ;, 1 3 : ; ; ; ; : TP311 : A M in ing M ulti2d im en siona l Com plex A ssoc ia tion Rule Ba sed on Artif ic ia l Imm une System and Gene Expression Programm ing ZENG Tao 1, TAN G Chang2jie 1, ZHU M ing2fang 1, 2, X IAN G Yong 1, 3, L IU Yin2tian 1, CHEN Peng 1 (1. School of Computer, Sichuan Univ., Chengdu 610065, China; 2. Dep t. of Computer Sci. and Technol., Shaanxi Univ. of Technol., Hanzhong 723003, China; 3. Chengdu Electromechanical College, Chengdu 610031, China ) Abstract: In order to handle rich sem antics for comp lex data m ining app lication, the formal concep t ofm ulti2dimen2 sional Comp lex A ssociation Rule (MDCAR) was p roposed. To m ine it, a novel method based on A rtificial Immune Gene Exp ression Programm ing (A IGEP) was introduced, where, new structures of antibody and immune cell were designed to decrease computing comp lexity, the special negative select strategy was p resented to elim inate invalid or redundant immune cells according to system requirements, and a heuristic MDCAR reduction criterion was intro2 duced, that is, a strong rule is fine only if the contra2positive of it is strong. Experiments showed that the new method can m ine MDCAR w ith good efficiency and high p recision and imp rove the performance, in certain case, 10 1000 times higher than that without negative select strategy. Key words: data m ining; multi2dimensional comp lex association rule; meta2rule; gene exp ression p rogramm ing; artificial immune system [ 1-4 ] : 2006-01 - 17 : (60473071; 90409007) : (1976 - ),,. :.,,, [ 1-4 ],,,
5, : 137 12 1: 30 40 (a), (b), (? c) (d), : 1: ( a b (? c) ) d,,,, 1 2 : 40 50 (a), (b),5000 (c) (d), : 2: ( a ( b c) ) d 23: ( a b) d 4: ( a c) d, 3 4 2,,, 1 2,,,,,, 1 [ 4 ] Fu,, (1) (2), "" 1 () A ttr = {A 1, A 2,, A n } X < A ttr, Y < A ttr, X g, Y g, X Y = g P Q X Y,,? 1 : 1) P Q, P, Q 2) P Q 3) P Q,,, 2 ( ) ru le c, ru le m, : 1) ru le c ru le m (Unified),ru le c ru le m 2) ru le c, 3) ru le c, (MDCAR) : (Unified) [ 4 ] 2. 2 Unified 3: ru le m : (A B (? C) ) D, ru le c ru le m MDCAR ru le c : (A ( a) B ( b) (? C ( c) ) ) D ( d), ru le c,, : A ( x) x; B ( x) x; C ( x) x; D ( x) x ru le c :a, b, c,d, 1 (1), [ 1 ], 2[ 6 ],[ 6 ] GEP,GEP,,,, GEP [ 5-7 ] [ 8-10 ] (A IGEP), 2 A IGEP ( Gene Exp ression Program2 m ing, GEP), Candida Ferreira [ 5 ] GEP,,,, [ 5-7 ]
138 () 38,,, (A rtificial Im2 mune System, A IS) [ 8-10 ], [ 8-10 ] GEP [ 5-7 ],, [ 10 ], GEP,,,A IGEP,: : MDCAR : : : GEP : ( ), 1 n :MD2 CAR,,,,, [ 6 ] PAGEP 2. 1, GEP [ 5-7 ], MDCAR,1 1 Tab. 1 An exam ple of An tibody 1 f 1 (A ( a) B ( b) (? C ( c) ) ) f 2 D ( d) 2 s 1 A (30 40 ) B ( ) (? C ( ) ) s 2 D () 3 p 1 s 1 p 2 s 2 4 p 3 s 1 s 2 p total : A ( x) x; B ( x) x; C ( x) x; D ( x) x 1, : 3 (Antibody) 3 ( F, S, I),, 1) F = ( f 1, f 2 ) GEP 2 2) S = ( s 1, s 2 ) 2 3) I = ( p 1, p 2, p 3, p total ) 4,, p 1, p 2, p 3 s 1, s 2 s 1 s 2, p total,, 2 2 Tab. 2 An exam ple of BCell 1 g 1? AB C g 2 D 2 f 1 (A B (? C) ) f 2 D 3 v 0 2 GEP,GEP,[ 7 ], GEP K, GEP : 4 ( BCell ) 3 (G, F, v),,
5, : 139 1) G = ( g 1, g 2 ) 2, 2) F = ( f 1, f 2 ) GEP 2 3) v v { - 1, 0, 1, 2},, 0, - 1, 1, 2 2. 2 A IGEP [ 10 ],,,1 A IGEP A IGEP 1 A IGEP : TS,supp,conf : : 1 Initialize parameters; 2 WH ILE ( generation < maxgennum AND bgstate < m axzerocount) { 3DO{ 4 IF ( elitepool! = NULL) { 5 BCellSet. add ( generatebcellsbyelitepool ( every_ gen _ cellnum - B cellset. size ) ) ; } 6 nselect(bcellset) ; / / 7 IF (BCellSet. size > = every_gen_ cellnum ) BREAK; 8 BCellSet. add ( generatebcell ( every_gen_ cel2 lnum - BCellSet. size) ) ; 9 nselect(bcellset) ; 10 }WH ILE ( BCellSet. size < every_gen _ cellnum AND hfc < z ) 11 AntiBodySet = generateantibodyset ( BCellSet, TS ) ; / / 12 addtonrulepool(bcellset) ; / / 13 Match (AntiBodySet, TS) ; / / - 14 Select(AntiBodySet, conf, supp) ; / / 15 OutPut(AntiBdoySet) ; 16 addtoelitepool(antibodyset) ; / / 17 clone (BCellSet) ; / / 18 } STOP; 2. 3, A IGEP [ 10 ], : 1) BCell, ; 2) BCell. F = ( f 1, f 2 ), ( f 2, f 1 ) (? f 1, f 2 ) ( f 2,? f 1 ) ( f 1,? f 2 ) (? f 2, f 1 ) (? f 1,? f 2 ) (? f 2,? f 1 ) 7 ; : 1) 2) ; 3),, 2. 4 A b, A = A b. s 1 B = A b. s 2,, A b. I = ( p 1, p 2, p 3, p total ), P (A ) = p 1 P (B ) = p 2 p total p total P (A B ) = p 3, A B p total s = P (A B ) c = P (B A ), P (A ) P (B ) P (A B ) P (? A ) P (? B ) P (? A B ) P (A? B ) P (? A? B ), 8 MDCAR, 8 MDCAR: 1) A B; 2) A? B; 3) B A; 4) B? A; 5)? A B; 6)? A? B; 7)? B A; 8)? B? A 8, 8,,,,
140 () 38,,,, : ;,,, 3 1,2, 3,, [ 11 ] 1Ts n, m, - O ( n m ) 2Ts n, m, A ttr = {A 1, A 2,, A m }, A i A i, BCell X = { x 1,, x k }, X Α A ttr, B cell AB S ca le, max (AB S ca le) = n, AB S ca le Φ m in ( n, 7 k x i ) 1 3Ts n, m, A ttr = {A 1, A 2,, A m }, A i A i, k, 1 O ( n 2 ) 1 A IGEP O ( n 2 ) :, O ( c), O ( n), 3 O ( n 2 ),A IGEP C 1 O ( c) + C 2 O ( n) +O ( n 2 ) = O ( n 2 ), 1, A IGEP MDCAR,, 4, CPU: Intel C3 1. 0 GHz, RAM: 384 M, HDD: 2 80 G;,MS W indows XP p rofessional SP1, JDK1. 5, UC I [ 6 ]PAGEP, A IGEP, Ap rior [ 3 ],Ap rior,,, :,, [ 1 ] Ap rior [ 3 ] 4. 1,,, A IGEP Ap rior( ) (, MD2 CAR),A IGEP Ap rior 1, 9, A IGEP ( { },),, A IGEP 1, 9, F ig. 1 The end of m in ing where d im en sion num ber is 9 and the objective is trad itiona l m ulti2d im en sion2 a l a ssoc ia tion rule 2, 3cmc,, 4, maxgenn um = 300, cellnum _every_gen = 20, m axz eroc oun t = 20, hfc =
5, : 141 200, A IGEP MDCAR, Ap riori, ( TMAR) 2 F ig. 2 Rela tion sh ip between d im en sion num ber and u2 n ique imm une cell num ber where the order of pred ica tes in MDCAR is not con sidered 3( conf = 95. 0% supp = 1. 0% ) Tab. 3 M in ing MDCAR( conf = 95. 0% su pp = 1. 0% ) A IGEP Apriori (MDCAR) ( TMAR) 1 {2, 3, 4, 5} { } 41 41 2 {2, 3, 4, 5} {,,? } 5975 41 3 {2, 3, 4, 5} {,,? } 1406 41 4 {2, 3, 4, 6} { } 10 10 5 {2, 3, 4, 6} {,,? } 5325 10 6 {2, 3, 4, 6} {,,? } 586 10 7 {4} {2, 3, 5, 6, 7, 8, 9} {,,? } 21072 :7,,, 3,,,,,,, 32 3, 5975 1406,,, 1406 2 = 703, 5 6 5325 586 2 = 293, 4MDCAR 4:5 6 36, 5: D 3 (2) D 6 (1) D 2 (2)? D 4 (0), conf = 97. 81%, supp = 9. 10%; 6:? (D 2 (2)? D 4 (0) )? D 6 (1) ), conf = 95. 95%, supp = 4. 82%, D i ( x) ix (D 3 (2) 5 6,, :,, MDCAR, A IGEP, MDCAR, 3 7, 4. 2 UC I cmc, : {,,? }, maxz erocoun t = 20, hfc = 200, 4 Tab. 4 Effect of nega tive select stra tegy 1 {2, 3, 4} 500 20 2 82653 27 212 2 {2, 3, 4, 5} 500 20 39 93415 742 7483 3 {2, 3, 4, 5, 6} 500 20-15557 10000 105422 4 {2, 3, 4, 5, 6} 2000 20 1207 771279 21847 259474 5 {2, 3, 4, 5, 6, 7} 500 20-3202 10000 18868 6 {2, 3, 4, 5, 6, 7, 8} 500 20-1484 10000 27608, 1 2 4 ( ), 1 3,
142 () 38, GEP,, MDCAR 4. 3, : 1),, A IGEP Ap riori,, A IGEP Ap riori ; 2),, ; 3),, 1 3,, A IGEP, 5,,,,,,,,,, : [ 1 ] Han J iawei, Kambr M. Data m ining2concep ts and tech2 niques[m ]. Beijing: H igher Education Press, 2001. [ 2 ]Agrawal R, Im iclinski T, Swam i A. Database m ining : a performance perspective [ J ]. Data Enginnering, 1993, 5: 914-925. IEEE Trans Knowledge and [ 3 ]Agrawal R, Srikant R. Fast algorithm for m ining association rules [ C ] / / Proc of 1994 International conference Very Large Data Bases (VLDB 94), Santiago : Chile, 1994: 487-499. [ 4 ] Fu Y, Han J. Meta2rule2guided m ining of association rules in relational databases[ C ] / / Proc of First Int l Workshop Integration Knowledge D iscovery with Deductive and Ob2 ject2o riented Databases ( KDOOD 95), Singapore, 1995: 39-46. [ 5 ] Ferreira C. Gene exp ression p rogramm ing: a new adap tive algorithm for solving p roblem s [ J ]. Comp lex System s, 2001, 13 (2) : 87-129. [ 6 ] Zuo J ie, Tang Changjie, Zhang Tianqing. M ining p redicate association rule by gene exp ression p rogramm ing [ C ] / / Proc of the 3 rd International Conference on W eb2age Infor2 mation Management (WA IM 2002), Beijing, 2002: 92-103. [ 7 ] Zuo J ie. Research on the key technology of gene exp ression p rogramm ing [ D ]. Chengdu: Sichuan University, 2004. [. [ D ]. :, 2004. ] [ 8 ]De Castro L N, Von Zuben F J. A rtificial immune system s: Part I Basic theory and app lications[ R ]. 1999. [ 9 ]De Castro L N, Von Zuben F J. A rtificial immune system s: Part II A survey of app lications [ R ]. RT DCA, 2000. [ 10 ]. [M ]. :, 2004. [ 11 ] Zeng Tao, Tang Changjie, Zhu M ingfang, et al. A IGEP: an app roach for m ining multi2dimension comp lex association rule [DB /OL ]. http: / /www. paper. edu. cn, 2005052193. [,,,. A IGEP: [ DB /OL ]. http: / /www. paper. edu. cn, 2005052193. ] ( )