DEIM Forum 2016 G7-5 152-8565 2-12-1 152-8565 2-12-1 889-1601 5200 E-mail: uragaki.k.aa@m.titech.ac.jp,,,.,,,,,,, 1. 1. 1,,,,,,.,,,,, 1. 2 [1],,,,, [2] (, SPM),,,,,,,. [3],, [4]. 2 A,B, A B, B A, B, 2,,, Chen SPM [5](, TI-SPM) PrefixSpan [6], 2,, TI-SPM (, TI) 2,,,.,,, 1. 3,,
,,,,,,,., 2,,, 3,, 1. 4 2. 3.. 4., 3.,,. 5. 2. 2. 1 SPM Agrawal SPM (, SDB) [2]., SDB,.,., [4], 1. 2 A = a 1, a 2,..., a n (, a i. i = 1, 2,..., n), B = b 1, b 2,..., b m (, b i. i = 1, 2,..., m),, A B, A = B. 1 a 1 = b j1, a 2 = b j2,..., a n = b jt 2 n < = t < = m 3 1 < = j 1 j 2... j t < = m 2. A = a 1, a 2,..., a n (, a i. i = 1, 2,..., n) A = B B, A, A =. 3. MinSup(0 < = MinSup < = 1), SDB D, A = a 1, a 2,..., a n (, a i. i = 1, 2,..., n), {Seq A = Seq, (sid, Seq) D, sid Seq } > = Size(D) MinSup, A D MinSup, Sup(a i) a i D, Size(D) D 4. SDB D A, B \A, A. 1 A = B 2 Sup(A) = Sup(B) Sup(A) Sup(A) {s s = Seq, Seq D}. 2. 2 TI-SPM Agrawal [2], 2., 2015 1 11,, 2015 1 11, 1. Chen 2 TI-SPM [5]., TI-SPM, SDB D, MinSup(0 < = MinSup < = 1), TI- TI- TI-., TI, TI-, TI-, TI-, TI- 5. TI r 1 T 1, T 2,..., T r 1, TI I k (k = 0, 1,..., r 1, r) {0} (k = 0) {t 0 t < I k = T 1 } (k = 1) {t T k 1 t < = T k } (k = 2, 3,..., r 1) {t T r 1 t} (k = r)
6. TI- r 1 T 1, T 2,..., T r 1 r + 1 TI TI- V. TI-SPM,, 7. TI- I, TI- V, B TI { b 1 (k = 1) B = b 1, & 1, b 2, & 2,..., b k 1, & k 1, b k (k > = 2), i = 1, 2,..., k b i I, v = 1, 2,..., k 1 & v V. 8. TI- A = (a 1, t 1 ), (a 2, t 2 ),..., (a n, t n ) TI- B = b 1, & 1, b 2, & 2,..., b m 1, & m, b m, 1 < = j 1 j 2... j m < = n {j m }, B A TI-, B = A. 1 b 1 = a j1, b 2 = a j2,..., b m = a jm 2 t ji t ji 1 & i 1 (i = 2, 3,..., s) 9. TI- SDB D, MinSup (0 < = MinSup < = 1), TI- α {(sid, s) (sid, s) D, α = s} > = Size(D) MinSup, α TI-, Chen PrefixSpan [6], TI, I-PrefixSpan [5]., I-PrefixSpan, I-PrefixSPan 10. TI- A = (a 1, t 1 ), (a 2, t 2 ),..., (a n, t n ), TI- B = b 1, & 1, b 2,..., b m 1, & m 1, b m, B A TI- 1 m < = n 2 a i = b i (1 < = i < = m) 3 t i t i 1 & i 1 (1 i < = m 1) 11. A = (a 1, t 1), (a 2, t 2),..., (a n, t n), A TI- TI- B = b 1, & 1, b 2,..., b m 1, & m 1, b m m < = n a ik = b k (1 < = k < = m), A A = (a 1, t 1), (a 2, t 2),..., (a n, t n ), A A B 1 n = n + m i m i m. (0 < = i m < = n) 2 B A TI- 3 A n i m A n i m 12. TI- A = (a 1, t 1 ), (a 2, t 2 ),..., (a n, t n ) TI- B = b 1, & 1, b 2,..., b m 1, & m 1, b m A = (a 1, t 1), (a 2, t 2),..., (a n, t n )., A B = (a m+1, t m+1), (a m+2, t m+2),..., (a n, t n ) B A TI- 13. SDB SDB D, D α TI- SDB D α. I-PrefixSpan SDB D, MinSup TI- I D, SDB., α SDB, β, I TI, TI & Size(D) MinSup, α & β TI-, α & β, TI- 3.. 3. 1 ( Type, Explain, Code, Name) 4 Code Name null,, 613, (,, 613, ) (,,null,null)., 613, Type, Code Name null., Type Code Name Explain, null. Explain,, [3],.
,,,,, Type Type, Code Name 3. 2,, Type, Explain, Code, Name 4,, Wright [8].,,, SDB.,,,,,, Name Type,Explain,Code, 3. 3 3. 3. 1 TI-SPM,, TI-SPM I-PrefixSpan [5], TI-, TI-,, 2 3. 3. 2 T-PrefixSpan, TI-SPM, Huang [7] T-PrefixSpan. Huang,,, 3, T-PrefixSpan, T-PrefixSpan,,,, SDB. 14. (i, t) I, i I t, i t (i, t) 15. s s,. s = (i 1, t 1), (i 2, t 2),..., (i n, t n)., s length(s) length(s) n, O s = i 1, i 2,..., i n s 16. T I k s = (i 1, t 1), (i 2, t 2),..., (i n, t n), T I k T I k t k+1 t k (k = 1, 2,..., n 2, n 1) 17. SDB D S, SDB D D {(sid, s) sid, s S}, D 2 sid. SDB SDB SDB, SDB, 18. P MinSup (0 < = MinSup < = 1), SDB D, P = i 1, X 1, i 2, X 2,..., i n 1, X n 1, i n ( j i j, k X k 5 (min k, mod k, ave k, med k, max k ), O P = i 1, i 2,..., i n 1, i n, O P D SDB MinSup,, min k, mod k, ave k, med k, max k., O P D S = i 1, t 1, i 2, t 2,..., i m 1, t m 1, i m, i k = i j k, i k+1 = i j k+1 k = 1, 2,..., n 1, 1 < = j 1 j 2... j n 1 j n < = m, T I k = t j k+1 t j k Set T Ik., X k = (min k, mod k, ave k, med k, max k ),min k = min Set T Ik, mod k Set T Ik, ave k Set T Ik, med k Set T Ik, max k = max Set T Ik.,
X j = (min j, mod j, ave j, med j, max j) (1 < = j n), min j = max j, i j i j+1, min j = max j = 0, O P P 19. SDB D A, B \A, A 1 A.B A, B, A = B. 2 (1), A = a 1, T 1, a 2, T 2,..., a n 1, T n 1, a n, B = b 1, T 1, b 2, T 2,..., b m 1, T m 1, b m, a k = b jk, a k+1 = b jk+1 k = 1, 2,..., n 1, 1 < = j 1 j 2... j n < = m., T k = (min k, mod k, ave k, med k, max k ), T j k = (min j k, mod j k, ave j k, med j k, max j k ), min k > = min j k max k < = max j k. 3 Sup(A) < = Sup(B) Sup(A) Sup(A) {s s = S, (sid, S) D, sid S }., 1 SDB D, MinSup = 0.4 1 SDB D sid 10 (A, 1), (B, 3), (C, 7), (E, 10) 20 (A, 1), (B, 4), (E, 7) 30 (A, 2), (B, 6), (B, 9) 40 (A, 2), (B, 5) 50 (A, 2), (B, 7), SDB SDB O D 2, O D MinSup = 0.4, A, B, E, A, B, B, E, A, B, E. 2 D SDB O D sid 10 A, B, C, E 20 A, B, E 30 A, B, B 40 A, B 50 A, B O D, D, A, B, E D, A, B A B, D {2, 3, 3, 4, 5},,,,,, A, (2, 3, 3, 3, 5), B, B, E, A, B, E, B, (3, 5, 5, 5, 7), E, A, (2, 2, 2, 2, 3), B, (3, 5, 5, 5, 7), E., D MinSup = 0.4 A, B, E, A, (2, 3, 3, 3, 5), B, B, (3, 5, 5, 5, 7), E, A, (2, 2, 2, 2, 3), B, (3, 5, 5, 5, 7), E., A, B, A, (2, 3, 3, 3, 5), B, A, (2, 2, 2, 2, 3), B, (3, 5, 5, 5, 7), E. PrefixSpan [6] SDB T-PrefixSpan. T-PrefixSpan Algorithm 1., SDB D SDB Original(D), S Original(S), A B AB. X n X n, S n S n, A n T (A n)., Smirnov-Grubbs [9] α = 0.05. ( Type, Explain, Code, Name) 4, t.,,,, Type, Explain, Code, Name SDB, T-PrefixSpan, 4..,,,, 4. 1 1991 11 19 2015 10 4,
Algorithm 1 T-PrefixSpan Input : SDB D, MinSup Output : P Call : T-PrefixSpan(,D) Procedure : T-PrefixSpan(α,D α) 1: D α = Original(D α ) 2: if α! = null then 3: P GetProperTime(α, D α, D α) 4: end if 5: B {β (s = D α, β s) (Sup(β) > = Size(D) Minsup)} 6: for β B do 7: D αβ { sid, s D α αβ = Original(s)} 8: Call T-PrefixSpan(αβ, D αβ ) 9: end for Subroutine : GetProperTime(α, D α, D α) 1: if length(α) == 1 then 2: return α 3: end if 4: K {k sid, s D α,original(s) D α, k = s,original(k) == α} 5: T = {{}, {},..., {}}( T = length(α) 1) 6: for k K do 7: for i = 0,..., length(k 1) do 8: T i T (k i+1 ) T (k i ) 9: end for 10: end for 11: W = α 0, α 1,..., α length(α) 1 12: for i = 0,..., length(α) 2 do 13: T i 14: min i = min T i 15: mod i = (T i ) 16: ave i = (T i ) 17: med i = (T i ) 18: max i = max T i 19: X i = (min i, mod i, ave i, med [ i], max [ i]) 20: W = α 0,..., α i, X i, α i+1..., α length(α) 1 21: end for 22: return W, 2 4. 2,, T-PrefixSpan,,,, OS : Windows7 Professional 64bit CPU : Intel(R) Xeon(R) CPU E3-1241 v3 @ 3.65GHz(8CPUs) Memory : 16GB Java 1.8.0 45 2 (1), (2)TUR-Bt,,,, 3 3 TUR-Bt 265 488 19.64 19.16 53.21 49.89 10 9 11 11 460 465 655 485 WATATUMI [10],, ID., HP [11], (1), (2)TUR-Bt 2, 3. (1), (2)TUR-Bt 4. 3 1, 2, 3, 4, 5, 6. 1,.,
2 TUR-Bt 6 TUR-Bt 3 4 TUR-Bt 5 TUR-Bt,, T-PrefixSpan,,,., 0.02 1.0, 0 7,, 2, 2. 7, 8..,. 7 8,,,., MinSup = 0.02, 8, 7,., 7 5. 5. 1 SDB,
,, TUR-Bt,,,,, 7, (A) (#25240014), HP [11], 8.,,,,,, 5,,,,. 5. 2 PrefixSpan [6], T-PrefixSpan,, CloSpan [4],Clasp [12],CSpan [13],, 2,,, T-PrefixSpan TI-SPM, [1],,,,,. DEIM Forum 2014, F6-2, 2014. [2] Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules in large databases. Proceeding of the 20th International Conference on Very Large Data Bases, pp. 487-499, 1994. [3],,,,. DEIM Forum 2015, G5-1, 2015. [4] X. Yan, J. Han and R.Afshar. CloSpan: Mining closed sequential patterns in large databases. Proc.SIAM Int 1 Conf. Data Mining (SDM 03), pp. 166-177, May 2003. [5] Yen-Liang Chen, Mei-Ching Chiang and Ming-Tat Ko. Discovering time-interval sequential patterns in sequence databases. Expert Systems with Applications 25, pp. 343-354, 2003. [6] Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Helen Pinto, Qiming Chen, Umeshwar Dayal, Mei-Chun Hsu. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix- Projected Pattern Growth. Proceeding of 2001 International Conference on Data Engineering, pp. 215-224, 2001. [7] Zhengxing Huang, Xudong Lu and Huilong Duan. On mining clinical pathway patterns from medical behaviors. Artificial Intelligence in Medicine 56 (2012) 35-65, 2012. [8] Aileen P. Wright, Adam T. Wright, Allison B. McCoy and Dean F.Sittig. The use of sequential pattern mining to predict next prescribed medications. Journal of Biomedical Informatics 53(2015) 73-80, 2015. [9] http://aoki2.si.gunma-u.ac.jp/lecture/grubbs/grubbs.html [10] WATATUMI. http://www.corecreate.com/02 01 izanami.html [11] http://www.med.miyazaki-u.ac.jp/home/jyoho/ [12] Antonio Gomariz, Manuel Campos, Roque Marin and Bart Goethals. Clasp: An efficient algorithm for mining frequent closed sequences. PAKDD 2013, LNAI7818, Part I, pp. 50-61, 2013. [13] V.Purushothama Raju and G.P. Saradhi Varma. MIN- ING CLOSED SEQUENTIAL PATTERNS IN LARGE SE- QUENCE DATABASES. International Journal of Database Management Systems ( IJDMS ) Vol.7, No.1, February 2015.