Re-Pair 1 1 Re-Pair Re-Pair Re-Pair Re-Pair 1. Larsson Moffat [1] Re-Pair Re-Pair (Re-Pair) ( ) (highly repetitive text) [2] Re-Pair [7] Re-Pair Re-Pair n O(n) O(n) 1 Hokkaido University, Graduate School of Information Science and Technology, {tmasaki, kida}@ist.hokudai.ac.jp Re-Pair Wan Moffat [6] Re-Pair (Re-Merge) Re-Pair Re-Merge Re-Pair Sekine [4, 5, 8] Blocked-Repair-VF Maruyama [2] FOLCA FOLCA Sakamoto [3] LCA c 2015 Information Processing Society of Japan 1
[9] Re-Pair Re-Pair LT-RePair SemiOnlineReplace [10] LT-RePair SemiOnlineReplace Re-Pair [1] [7] 2. [10] LT-RePair SemiOnlineReplace 2.1 LT-RePair LT-RePair Re-Pair Re-Pair XY h(x) h(y ) (LeftTall) h(x) X LT-RePair CFG G σ α i0 α i1 α im 1 ( i k {0,, Σ + V 2}), a i (0 i < Σ ), α i α j α k (0 j, k < i h(α j ) h(α k )) (i Σ ). G X [1] Re-Pair n O(n) LT-RePair O(n) O(1) 2.2 SemiOnlineReplace SemiOnlineReplace D D LT-RePair SemiOnlineReplace 1 Algorithm 1 T n B B p h(p) p p.prev p.next NIL C(p) p p.next D RMQ(p) B p D NIL UpdateST RMQ Replace(p) p p.next D Output(p) B p n g ĝ SemiOnlineReplace O(n log ĝ) O(g) [10] c 2015 Information Processing Society of Japan 2
Algorithm 1 SemiOnlineReplace 1: procedure Main 2: T := T [1, n] 3: T [n + 1] dummy 4: B := 5: for i := 1, n + 1 do 6: B.append(T [i]) 7: last pos := B.tail 8: RecursiveReplace(B, last pos.prev) 9: end for 10: end procedure 11: procedure RecursiveReplace(B, p) 12: if p = NIL OR p.next = NIL then 13: return 14: end if 15: if h(p) > h(p.next) then 16: if h(p.prev) = h(p) AND C(p.prev) > C(p) then 17: UpdateST(p.prev) 18: else 19: UpdateST(p) 20: end if 21: return 22: else 23: if p.prev = NIL OR h(p.prev) = h(p) then 24: m p.prev 25: else 26: m RMQ(p.prev) 27: end if 28: while C(m) AND C(m) C(p) do 29: m m.next 30: Replace(m) 31: RecursiveReplace(m.prev) 32: RecursiveReplace(m) 33: if m = p then 34: return 35: end if 36: m RMQ(p.prev) 37: end while 38: if C(m) = AND C(p) = then 39: Output(p) 40: end if 41: end if 42: end procedure LT-RePair 3. LT-RePair (AdaptiveBlockExpand) LT-RePair SemiOnlineReplace ( 1 ) T m F ( 2 ) F LeftTall D ( 3 ) SemiOnlineReplace F ( 4 ) F F D (2) ( 5 ) T (1) m (4) Σ + D = 2 l l LT- RePair 3.1 Algorithm 2 T n F m D GetMaxPair(F ) F LeftTall NIL AddRule(D, p) D p Output(F, D) F D Replace(F, p) F p SemiOnlineReplace(F [m cb, m], T [f, n]) F [m cb, m] T [f, n] SemiOnlineReplace F T 2-3 f 4 while 5-8 LT-Repair SemiOnlineReplace T f m F c 2015 Information Processing Society of Japan 3
Algorithm 2 AdaptiveBlockExtend 1: procedure Main 2: T := T [1, n] 3: f := 1 4: while f n do 5: D := 6: F [1, m] = T [f, f + m 1] 7: f := f + m 8: p := GetMaxPair(F ) 9: while p NIL do 10: AddRule(D, p) 11: cb := Replace(F, p) 12: ct := SemiOnlineReplace(F [m cb, m], T [f, n], p) 13: f := f + ct 14: p := GetMaxPair(F ) 15: end while 16: Output(F, D) 17: end while 18: end procedure f f + m 9-15 while LT-RePair SemiOnlineReplace p =NIL LT-RePair cb F SemiOnlineReplace ct ct f while F D 4 3.2 b g ĝ Algorithm2 AddRule 1 GetMaxPair 1 Replace n Replace 1 Replace O(n) SemiOnlineReplace n ĝ SemiOnlineReplace O(n log ĝ ) Output O(n) 5-8 4 while n O(n) 9-15 while 1 2 while n while Replace SemiOnlineReplace O(n) O(n log ĝ ) LT- RePair SemiOnlineReplace 2 LT-RePair O(m) LT-RePair LT-RePair O(m) O(b ) LT-RePair O(m + b ) SemiOnlineReplace SemiOnlineReplace O(g ) O(m + g + b ) g b O(m+b ) 4. 4.1 Re-Pair Web Pizza&Chili Corpus (http://pizzachili.dcc.uchile.cl/index.html) 100MB DNA Re-Pair 1MB 5MB intel R Xeon (R) CPU E3-1225 V2@3.20GHz 15.6GiB c 2015 Information Processing Society of Japan 4
1 (%) (MB) (s) Re-Pair 31.7 1403 22.8 (5MB) 33.0 424 78.0 (1MB) 33.8 142 82.9 JSPS 15K00002 24240021 Ubuntu 12.04 LTS (64bit) C++ GCC (version 4.6.3) 4.2 1 Re-Pair 5MB 3 1MB 14 SemiOnlineReplace Re-Pair O(m + b ) b LT-RePair 5. Larsson Moffat [1] Re-Pair LT-RePair SemiOnlineReplace [10] m g b ĝ O(n log ĝ ) O(m + b ) Re-Pair LT-RePair [1] Larsson, N. J. and Moffat, A.: Offline Dictionary- Based Compression, Proceedings of the Data Compression Conference 1999 (DCC 99), IEEE Computer Society, pp. 296 305 (1999). [2] Maruyama, S., Tabei, Y., Sakamoto, H. and Sadakane, K.: Fully-online grammar compression, Proceedings of the 20th international conference on String processing and information retrieval (SPIRE 2013), pp. 218 229 (2013). [3] Sakamoto, H., Kida, T. and Shimozono, S.: A Space- Saving Linear-Time Algorithm for Grammar-Based Compression, String Processing and Information Retrieval, Lecture Notes in Computer Science, Vol. 3246, Springer Berlin / Heidelberg, pp. 218 229 (2004). [4] Sekine, K., Sasakawa, H., Yoshida, S. and Kida, T.: Variable-to-Fixed-Length Encoding for Large Texts Using Re-Pair Algorithm with Shared Dictionaries, Proceedings of the Data Compression Conference 2013 (DCC 2013), p. 518 (2013). [5] Sekine, K., Sasakawa, H., Yoshida, S. and Kida, T.: Adaptive Dictionary Sharing Method for Re-Pair Algorithm, Data Compression Conference (DCC), 2014, pp. 425 425 (online), DOI: 10.1109/DCC.2014.73 (2014). [6] Wan, R. and Moffat, A.: Block merging for off-line compression, Journal of American Society for Information Science and Technology, Vol. 58, No. 1, pp. 3 14 (online), DOI: 10.1002/asi.v58:1 (2007). [7] Yoshida, S. and Kida, T.: A Variable-length-to-fixedlength Coding Method Using a Re-Pair Algorithm, IPSJ Transactions on Databases, Vol. 6, No. 4, pp. 17 23 (2013). [8]. DE, Vol. 112, No. 346, pp. 47 52 http://ci.nii.ac.jp/naid/110009667169/ (2012). [9] VF. AL, Vol. 2014, No. 8, pp. 1 5 http://ci.nii.ac.jp/naid/110009785568/ (2014). [10] Re-Pair Vol. 115, No. 84, pp. 37 43 (2015). c 2015 Information Processing Society of Japan 5