A Fast Mining Algorithm for Frequent Essential Itemsets

Σχετικά έγγραφα

Quick algorithm f or computing core attribute

Data mining Εξόρυξη εδοµένων. o Association rules mining o Classification o Clustering o Text Mining o Web Mining

ER-Tree (Extended R*-Tree)

Nov Journal of Zhengzhou University Engineering Science Vol. 36 No FCM. A doi /j. issn

Matrices and Determinants

Newman Modularity Newman [4], [5] Newman Q Q Q greedy algorithm[6] Newman Newman Q 1 Tabu Search[7] Newman Newman Newman Q Newman 1 2 Newman 3

[4] 1.2 [5] Bayesian Approach min-max min-max [6] UCB(Upper Confidence Bound ) UCT [7] [1] ( ) Amazons[8] Lines of Action(LOA)[4] Winands [4] 1

Schedulability Analysis Algorithm for Timing Constraint Workflow Models


Vol. 31,No JOURNAL OF CHINA UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb

Research of Han Character Internal Codes Recognition Algorithm in the Multi2lingual Environment

Αποθήκες Δεδομένων και Εξόρυξη Δεδομένων:

Re-Pair n. Re-Pair. Re-Pair. Re-Pair. Re-Pair. (Re-Merge) Re-Merge. Sekine [4, 5, 8] (highly repetitive text) [2] Re-Pair. Blocked-Repair-VF [7]

Αποθήκες Δεδομένων και Εξόρυξη Δεδομένων:

TID Items. Τ = {t 1, t 2,.., t N } ένα σύνολο από δοσοληψίες, όπου κάθε t i είναι ένα στοιχειοσύνολο

n 1 n 3 choice node (shelf) choice node (rough group) choice node (representative candidate)

Αποθήκες Δεδομένων και Εξόρυξη Δεδομένων

Adaptive grouping difference variation wolf pack algorithm

Οι διαφάνειες στηρίζονται στο P.-N. Tan, M.Steinbach, V. Kumar, «Introduction to Data Mining», Addison Wesley, 2006

ΠΩΣ ΕΠΗΡΕΑΖΕΙ Η ΜΕΡΑ ΤΗΣ ΕΒΔΟΜΑΔΑΣ ΤΙΣ ΑΠΟΔΟΣΕΙΣ ΤΩΝ ΜΕΤΟΧΩΝ ΠΡΙΝ ΚΑΙ ΜΕΤΑ ΤΗΝ ΟΙΚΟΝΟΜΙΚΗ ΚΡΙΣΗ

Lecture Notes for Chapter 6. Introduction to Data Mining

ΕΙΣΑΓΩΓΗ ΣΤΗ ΣΤΑΤΙΣΤΙΚΗ ΑΝΑΛΥΣΗ

Ανάλυση Συσχέτισης IΙ

Section 8.3 Trigonometric Equations

Ο Αλγόριθμος FP-Growth

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

A Method for Creating Shortcut Links by Considering Popularity of Contents in Structured P2P Networks

Αποθήκες εδοµένων και Εξόρυξη Γνώσης (Data Warehousing & Data Mining)

Οι διαφάνειες στηρίζονται στο P.-N. Tan, M.Steinbach, V. Kumar, «Introduction to Data Mining», Addison Wesley, 2006

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

GPU. CUDA GPU GeForce GTX 580 GPU 2.67GHz Intel Core 2 Duo CPU E7300 CUDA. Parallelizing the Number Partitioning Problem for GPUs

Toward a SPARQL Query Execution Mechanism using Dynamic Mapping Adaptation -A Preliminary Report- Takuya Adachi 1 Naoki Fukuta 2.

Fractional Colorings and Zykov Products of graphs

Bounding Nonsplitting Enumeration Degrees

!! " &' ': " /.., c #$% & - & ' ()",..., * +,.. * ' + * - - * ()",...(.

Gemini, FastMap, Applications. Εαρινό Εξάμηνο Τμήμα Μηχανικών Η/Υ και Πληροϕορικής Πολυτεχνική Σχολή, Πανεπιστήμιο Πατρών

SCITECH Volume 13, Issue 2 RESEARCH ORGANISATION Published online: March 29, 2018

Medium Data on Big Data

J. of Math. (PRC) Banach, , X = N(T ) R(T + ), Y = R(T ) N(T + ). Vol. 37 ( 2017 ) No. 5

ΜΕΛΕΤΗ ΤΗΣ ΗΛΕΚΤΡΟΝΙΚΗΣ ΣΥΝΤΑΓΟΓΡΑΦΗΣΗΣ ΚΑΙ Η ΔΙΕΡΕΥΝΗΣΗ ΤΗΣ ΕΦΑΡΜΟΓΗΣ ΤΗΣ ΣΤΗΝ ΕΛΛΑΔΑ: Ο.Α.Ε.Ε. ΠΕΡΙΦΕΡΕΙΑ ΠΕΛΟΠΟΝΝΗΣΟΥ ΚΑΣΚΑΦΕΤΟΥ ΣΩΤΗΡΙΑ

1 (forward modeling) 2 (data-driven modeling) e- Quest EnergyPlus DeST 1.1. {X t } ARMA. S.Sp. Pappas [4]

Πανεπιστήμιο Δυτικής Μακεδονίας. Τμήμα Μηχανικών Πληροφορικής & Τηλεπικοινωνιών. Βιοπληροφορική. Ενότητα 11: Κατασκευή φυλογενετικών δέντρων part II

Abstract Storage Devices

ΤΕΧΝΟΛΟΓΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΥΠΡΟΥ ΣΧΟΛΗ ΕΠΙΣΤΗΜΩΝ ΥΓΕΙΑΣ

Πανεπιστήµιο Πειραιώς Τµήµα Πληροφορικής. Εξόρυξη Γνώσης από εδοµένα (Data Mining) Εξόρυξη Κανόνων Συσχετίσεων. Γιάννης Θεοδωρίδης

Homework 8 Model Solution Section

Dynamic types, Lambda calculus machines Section and Practice Problems Apr 21 22, 2016

Motion analysis and simulation of a stratospheric airship

Instruction Execution Times

Approximation of distance between locations on earth given by latitude and longitude

Text Mining using Linguistic Information

Quadratic Expressions

ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ

UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS General Certificate of Education Ordinary Level

ΠΤΥΧΙΑΚΗ ΕΡΓΑΣΙΑ ΒΑΛΕΝΤΙΝΑ ΠΑΠΑΔΟΠΟΥΛΟΥ Α.Μ.: 09/061. Υπεύθυνος Καθηγητής: Σάββας Μακρίδης

Overview. Transition Semantics. Configurations and the transition relation. Executions and computation

ΔΙΕΡΕΥΝΗΣΗ ΤΗΣ ΣΕΞΟΥΑΛΙΚΗΣ ΔΡΑΣΤΗΡΙΟΤΗΤΑΣ ΤΩΝ ΓΥΝΑΙΚΩΝ ΚΑΤΑ ΤΗ ΔΙΑΡΚΕΙΑ ΤΗΣ ΕΓΚΥΜΟΣΥΝΗΣ ΤΕΧΝΟΛΟΓΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΥΠΡΟΥ ΣΧΟΛΗ ΕΠΙΣΤΗΜΩΝ ΥΓΕΙΑΣ

ΚΕΙΜΕΝΟΚΕΝΤΡΙΚΗ ΘΕΩΡΙΑ: ΘΕΩΡΗΤΙΚΟ ΠΛΑΙΣΙΟ ΚΑΙ ΠΕΙΡΑΜΑΤΙΚΗ ΕΦΑΡΜΟΓΗ ΣΕ ΣΠΠΕ ΜΕ ΣΤΟΧΟ ΤΟΝ ΠΕΡΙΒΑΛΛΟΝΤΙΚΟ ΓΡΑΜΜΑΤΙΣΜΟ ΤΩΝ ΜΑΘΗΤΩΝ

ΕΥΘΑΛΙΑ ΚΑΜΠΟΥΡΟΠΟΥΛΟΥ

Maude 6. Maude [1] UIUC J. Meseguer. Maude. Maude SRI SRI. Maude. AC (Associative-Commutative) Maude. Maude Meseguer OBJ LTL SPIN

Ερευνητική+Ομάδα+Τεχνολογιών+ Διαδικτύου+

EE101: Resonance in RLC circuits

Ημερίδα διάχυσης αποτελεσμάτων έργου Ιωάννινα, 14/10/2015


2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems

YOU Wen-jie 1 2 JI Guo-li 1 YUAN Ming-shun 2

(pattern recognition) (symbol processing) (content) (raw data) - 1 -

Molecular evolutionary dynamics of respiratory syncytial virus group A in

Main source: "Discrete-time systems and computer control" by Α. ΣΚΟΔΡΑΣ ΨΗΦΙΑΚΟΣ ΕΛΕΓΧΟΣ ΔΙΑΛΕΞΗ 4 ΔΙΑΦΑΝΕΙΑ 1

Wiki. Wiki. Analysis of user activity of closed Wiki used by small groups

1. A fully continuous 20-payment years, 30-year term life insurance of 2000 is issued to (35). You are given n A 1

ΚΒΑΝΤΙΚΟΙ ΥΠΟΛΟΓΙΣΤΕΣ

2 ~ 8 Hz Hz. Blondet 1 Trombetti 2-4 Symans 5. = - M p. M p. s 2 x p. s 2 x t x t. + C p. sx p. + K p. x p. C p. s 2. x tp x t.

Design and Fabrication of Water Heater with Electromagnetic Induction Heating

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

Επιβλέπουσα Καθηγήτρια: ΣΟΦΙΑ ΑΡΑΒΟΥ ΠΑΠΑΔΑΤΟΥ

Queensland University of Technology Transport Data Analysis and Modeling Methodologies

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

CorV CVAC. CorV TU317. 1

Other Test Constructions: Likelihood Ratio & Bayes Tests

The Research on Sampling Estimation of Seasonal Index Based on Stratified Random Sampling

Reading Order Detection for Text Layout Excluded by Image

ΤΕΧΝΟΛΟΓΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΥΠΡΟΥ ΤΜΗΜΑ ΝΟΣΗΛΕΥΤΙΚΗΣ

2 Composition. Invertible Mappings

, Litrrow. Maxwell. Helmholtz Fredholm, . 40 Maystre [4 ], Goray [5 ], Kleemann [6 ] PACC: 4210, 4110H

Supplementary Materials for Evolutionary Multiobjective Optimization Based Multimodal Optimization: Fitness Landscape Approximation and Peak Detection

Κάθε γνήσιο αντίγραφο φέρει υπογραφή του συγγραφέα. / Each genuine copy is signed by the author.

Resurvey of Possible Seismic Fissures in the Old-Edo River in Tokyo

Zigbee. Zigbee. Zigbee Zigbee ZigBee. ZigBee. ZigBee

Database programming in VC + + :applying ODBC API

Second Order RLC Filters

Πτυχιακή Εργασία Η ΠΟΙΟΤΗΤΑ ΖΩΗΣ ΤΩΝ ΑΣΘΕΝΩΝ ΜΕ ΣΤΗΘΑΓΧΗ

ΗΜΥ 210 ΣΧΕΔΙΑΣΜΟΣ ΨΗΦΙΑΚΩΝ ΣΥΣΤΗΜΑΤΩΝ. Χειµερινό Εξάµηνο ΔΙΑΛΕΞΗ 3: Αλγοριθµική Ελαχιστοποίηση (Quine-McCluskey, tabular method)

ΤΕΧΝΟΛΟΓΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΥΠΡΟΥ ΣΧΟΛΗ ΕΠΙΣΤΗΜΩΝ ΥΓΕΙΑΣ ΤΜΗΜΑ ΝΟΣΗΛΕΥΤΙΚΗΣ ΠΤΥΧΙΑΚΗ ΕΡΓΑΣΙΑ ΕΠΗΡΕΑΖΕΙ ΤΗΝ ΠΡΟΛΗΨΗ ΚΑΡΚΙΝΟΥ ΤΟΥ ΜΑΣΤΟΥ

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 24/3/2007

Optimization Investment of Football Lottery Game Online Combinatorial Optimization

Εργαστήριο Ανάπτυξης Εφαρμογών Βάσεων Δεδομένων. Εξάμηνο 7 ο

The challenges of non-stable predicates

Transcript:

40 6 Vol.40 No.6 Computer Engineering 2014 6 June 2014 1000 3428(2014)06 0120 05 A TP18 ( 230009) FMEP Rymon MEP 2 30 Rymon A Fast Mining Algorithm for Frequent Essential Itemsets TIAN Wei-dong, JI Yun (School of Computer and Information, Hefei University of Technology, Hefei 230009, China) Abstract Traditional frequent essential itemsets mining requires generating candidate itemsets and scanning database many times, which leads to the lower efficiency generation. Motivated by this, a fast algorithm of mining frequent essential itemsets is proposed. This algorithm uses Rymon enumeration tree as the strategy of space search and divide-and-conquer, meanwhile, it selects particular paths for pruning. It uses frequent essential itemsets unique properties to quickly determine whether a candidate itemset is a frequent essential itemset, without comparing with disjunctive support of all direct subsets. It is beneficial for quick mining. Experimental results show that this algorithm can correctly get all elements of frequent essential itemsets concise representation, and highly reduce the time consumption. It can reduce 2 times in dense datasets while reduce the time consumption in sparse datasets by 30 at least. Key words data mining; frequent itemsets; concise representation; frequent essential itemsets; Rymon enumeration tree DOI: 10.3969/j.issn.1000 3428.2014.06.026 1 [1] [2] CPU I/O [3] [4] [5-6] [7] [8-9] 2 (1) X X X (2) t t1 t2 tr 1, tr 2,,tr n t tr tr L tr 1 2 n i i 1,i 2,,i n i i1 i2 L in (60603068) (1970 ) 2013 01 28 2013 05 17 E-mail jiyun1988@126.com

40 6 121 2 [10] [11] [10] FMEP Rymon [12] 2 2.1 1( 3 ) IS={i1,i2,,im} m I IS k k IS D o TID K 3 O I R K=(O, I, R) R O I ( oi, ) R o i 3 supp( I ) supp( I ) supp( I ) supp( I ) = { o O ( i I,( o, i) R)} (1) supp( I ) = { o O ( i I,( o, i) R)} (2) supp( I ) = { o O ( i I,( o, i) R)} (3) 1 supp( I ) supp( I ) 2( ) X X supp( X ) X supp( X ) minsup supp( X ) minsup 3( 3 ) I 1 supp I supp I (4) 1 ( ) = ( 1) φ I1 I ( 1) I1 1 ( ) = ( 1) φ I1 I ( 1) supp I supp I (5) supp( I ) = O supp( I ) (6) 4() supp( I ) max{ supp( I \ i, i I )} (7) I supp( I ) minsup 2.2 (7) X X supp( X ) = max({ supp( Y ) Y X}) (8) Y ε Y E BD+ Y Arg max({ supp( X ), X X, X E}) X 1 supp( Y ), Y X supp( X ) = ( 1) (9) supp( X ) X X X φ X X BD + X E { Y Arg max({ supp( X ), X X, X E}) X X X φ X Y X 1 supp( X ) = ( 1) supp( X ) (10) [10] MEP Ci+1=Gen_Apriori(Li); Ci+1={X Ci+1 Y BD + (F):X Y}; Scan the database for mining the disjunctive frequency X Ci+ 1 Li+1= X Ci+ 1 x X:Freq( X) = Freq( X\x) 3 (1) Ci+1 (2) Ci+1 MEP (3) Ci+1 3 FMEP [10]

122 2014 6 15 3 FMEP Rymon [12] Rymon hash 4 3.1 5( g) g I D t g( I) = { t D i I, i t} g I I i i1 I i1 I g(1) i g() i g() i g(1) i I i I i I i I U i BD+ I i g(1) i g() i g(i) g(i1)=g(i) g(i i)=g(i) supp( I U i) = supp( I) (I i) (I) I i I i I U i BD+ 4 I i 3.2 FMEP (Fast Mining Essential Pattern) g POST FMEP D minsup EP BD+ BD+(F);=Max_Set_Algorithm(D,minfreq) EP=NULL produre FMEP(EP.gen,POST) while POST NULL do i=min<(post) POST=POST\i newgen=gen i if exist(i,gen) and newgen BD+ EP=EP newgen g(newgen)=g(i) g(i) NEWPOST=POST FMEP(EP,newgen, NEWPOST) endif Endwhile return EP BD+ function exist(i,gen) for all j gen do if gen(j) gen(i) or gen(i) gen(j) return true endif endfor return false end function D F1 g POST 1 EP gen POST POST i POST i Rymon i gen newgen exist i gen newgen exist 5 newgen EP newgen gen i g NEWPOST POST FMEP POST exist 2 gen i gen j gen( j) gen( i) gen() i gen( j) newgen 3.3 D abcd a bc cd abc 1 BD+={abcd}, g(a)={1,2,5}, g(b)= {1,3,5}, g(c)={1,3,4,5}, g(d)={1,4} 1,2,3,4,5 EP POST {abcd} (1)a POST a ={bcd} a a b ab, POST ab ={cd} ab ab c

40 6 123 abc POST abc ={d} g(b) g(c) abc ab d abd POST abcd ={} abd a POST {cd} a c ac POST ac ={d}ac ac d acd POST acd ={} g(c) g(d) acd a POST {d} a d ad POST ad ={} ad (2)b POST b ={cd} b b c bc POST bc ={d} g(b) g(c) bc b d bd POST bd ={ } bd bd (3)c POST c ={d} c c d cd cd (4)d POST d ={} d 3 pumsb 4 FMEP MEP FMEP C++ 2 PC Win7,4 GB FMEP [10] [10] connect pumsb chess pumbs_star T10I4D100K T40I10D100K http://fimi.cs.helsinki.fi/data/ 1~ 6 4 pumbs_star 12 10 8 6 4 2 FMEP MEP 5 T10I4D100K 0 50 40 30 20 10 / 1 connect 6 T40I10D100K 2 chess 1~ 3 FMEP MEP 3 4 FMEP MEP 1 pumbs_star Bayardo pumbs 80 T10I4D100K T40I10D100K FMEP MEP

124 2014 6 15 FMEP 5 FMEP Rymon MEP 2 30 [1] Han Jiawei, Kamber M. [M].,,. :, 2004. [2],. [J]., 2012, 38(5): 44-46. [3] Liu Guimei, Li J, Wong L. Positive Borders or Negative Borders: How to Make Lossless Generator Based Representations Concise[C]//Proc. of the 6th SIAM International Conference on Data Mining. [S. 1.]: IEEE Press, 2006: 469-473. [4] Calders T, Goethals B. Non-derivable Itemset Mining[J]. Data Mining and Knowledge Discovery, 2007, 14(1): 171-206. [5] Pasquier N, Bastide Y, Taouil R. Discovering Frequent Closed Itemsets for Association Rules[C]//Proc. of ICDT 99. [S. 1.]: IEEE Press, 1999: 398-416. [6],. [J]., 2008, 34(16): 50-52. [7] Bykowski A, Rigtti C. A Condensed Representation of Find Frequent Patterns[C]//Proc. of PDOS 01. [S. 1.]: IEEE Press, 2001: 56-63. [8] Kryszkiewicz M. Concise Representation of Frequent Patterns Based on Disjunction-free Generators[C]//Proc. of ICDM 01. [S. 1.]: IEEE Press, 2001: 305-312. [9] Kryszkiewicz M, Gajek M. Concise Representation of Frequent Patterns Based on Generalized Disjunction-free Generators[C]// Proc. of PAKDD 02. [S. 1.]: IEEE Press, 2002: 159-171. [10] Casali A, Cicchetti R, Lakhal L. Essential Patterns: A Perfect Cover of Frequent Patterns[C]//Proc. of the 7th International Conference on Data Warehousing and Knowledge Discovery. Copenhagen, Denmark: Springer-Verlag, 2005: 428-437. [11] Galambos J, Simonelli I. Bonferroni-type Inequalities with Applications[M]. New York, USA: Springer, 2000. [12] Rymon R. Search Through Systematic Set Enumeration[C]// Proc. of the 3rd International Conference on Principles of Knowledge Representation and Reasoning. [S. 1.]: IEEE Press, 1992: 539-550.