Re-Pair n. Re-Pair. Re-Pair. Re-Pair. Re-Pair. (Re-Merge) Re-Merge. Sekine [4, 5, 8] (highly repetitive text) [2] Re-Pair. Blocked-Repair-VF [7]

Σχετικά έγγραφα
Toward a SPARQL Query Execution Mechanism using Dynamic Mapping Adaptation -A Preliminary Report- Takuya Adachi 1 Naoki Fukuta 2.

GPU. CUDA GPU GeForce GTX 580 GPU 2.67GHz Intel Core 2 Duo CPU E7300 CUDA. Parallelizing the Number Partitioning Problem for GPUs

A Method for Creating Shortcut Links by Considering Popularity of Contents in Structured P2P Networks


SocialDict. A reading support tool with prediction capability and its extension to readability measurement

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

ΟΙΚΟΝΟΜΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΑΘΗΝΩΝ ΠΑΤΗΣΙΩΝ ΑΘΗΝΑ Ε - ΜΑΙL : mkap@aueb.gr ΤΗΛ: , ΚΑΠΕΤΗΣ ΧΡΥΣΟΣΤΟΜΟΣ. Βιογραφικό Σημείωμα

[4] 1.2 [5] Bayesian Approach min-max min-max [6] UCB(Upper Confidence Bound ) UCT [7] [1] ( ) Amazons[8] Lines of Action(LOA)[4] Winands [4] 1

Supplementary Materials for Evolutionary Multiobjective Optimization Based Multimodal Optimization: Fitness Landscape Approximation and Peak Detection

Quick algorithm f or computing core attribute

Fourier transform, STFT 5. Continuous wavelet transform, CWT STFT STFT STFT STFT [1] CWT CWT CWT STFT [2 5] CWT STFT STFT CWT CWT. Griffin [8] CWT CWT

GPGPU. Grover. On Large Scale Simulation of Grover s Algorithm by Using GPGPU

Optimization, PSO) DE [1, 2, 3, 4] PSO [5, 6, 7, 8, 9, 10, 11] (P)

A data structure based on grammatical compression to detect long pattern

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)

Yoshifumi Moriyama 1,a) Ichiro Iimura 2,b) Tomotsugu Ohno 1,c) Shigeru Nakayama 3,d)

{takasu, Conditional Random Field

FX10 SIMD SIMD. [3] Dekker [4] IEEE754. a.lo. (SpMV Sparse matrix and vector product) IEEE754 IEEE754 [5] Double-Double Knuth FMA FMA FX10 FMA SIMD

HOSVD. Higher Order Data Classification Method with Autocorrelation Matrix Correcting on HOSVD. Junichi MORIGAKI and Kaoru KATAYAMA

Kenta OKU and Fumio HATTORI

ΓΙΑΝΝΟΥΛΑ Σ. ΦΛΩΡΟΥ Ι ΑΚΤΟΡΑΣ ΤΟΥ ΤΜΗΜΑΤΟΣ ΕΦΑΡΜΟΣΜΕΝΗΣ ΠΛΗΡΟΦΟΡΙΚΗΣ ΤΟΥ ΠΑΝΕΠΙΣΤΗΜΙΟΥ ΜΑΚΕ ΟΝΙΑΣ ΒΙΟΓΡΑΦΙΚΟ ΣΗΜΕΙΩΜΑ

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

An Efficient Calculation of Set Expansion using Zero-Suppressed Binary Decision Diagrams

Retrieval of Seismic Data Recorded on Open-reel-type Magnetic Tapes (MT) by Using Existing Devices

Buried Markov Model Pairwise

GPU DD Double-Double 3 4 BLAS Basic Linear Algebra Subprograms [3] 2

DECO DECoration Ontology

Newman Modularity Newman [4], [5] Newman Q Q Q greedy algorithm[6] Newman Newman Q 1 Tabu Search[7] Newman Newman Newman Q Newman 1 2 Newman 3

Αλγόριθμοι. Α. Υπολογιστικά Προβλήματα. Β. Εισαγωγή στους Αλγόριθμους. Γ. ομή Αλγόριθμων. Δ. ομές εδομένων

An Advanced Manipulation for Space Redundant Macro-Micro Manipulator System

ER-Tree (Extended R*-Tree)

Elements of Information Theory

ΕΥΡΕΣΗ ΤΟΥ ΔΙΑΝΥΣΜΑΤΟΣ ΘΕΣΗΣ ΚΙΝΟΥΜΕΝΟΥ ΡΟΜΠΟΤ ΜΕ ΜΟΝΟΦΘΑΛΜΟ ΣΥΣΤΗΜΑ ΟΡΑΣΗΣ

Japanese Fuzzy String Matching in Cooking Recipes


Η Διαδραστική Τηλεδιάσκεψη στο Σύγχρονο Σχολείο: Πλαίσιο Διδακτικού Σχεδιασμού

Bundle Adjustment for 3-D Reconstruction: Implementation and Evaluation

Development of a basic motion analysis system using a sensor KINECT

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

[15], [16], [17] [6] [2] [5] Jiang [6] 2.1 [6], [10] Score(x, y) y ( 1) ( 1 ) b e ( 1 ) b e. O(n 2 ) Jiang [6] (word lattice reranking)

Vol.4-DCC-8 No.8 Vol.4-MUS-5 No.8 4// 3 3 Hanning (T ) 3 Hanning 3T (y(t)w(t)) dt =.5 T y (t)dt. () STRAIGHT F 3 TANDEM-STRAIGHT[] 3 F F 3 [] F []. :

Προγραμματισμός Ι. Πίνακες, Δείκτες, Αναφορές και Δυναμική Μνήμη. Δημήτρης Μιχαήλ. Τμήμα Πληροφορικής και Τηλεματικής Χαροκόπειο Πανεπιστήμιο

Applying Markov Decision Processes to Role-playing Game

Speeding up the Detection of Scale-Space Extrema in SIFT Based on the Complex First Order System

Τοποθέτηση τοπωνυµίων και άλλων στοιχείων ονοµατολογίας στους χάρτες

Vol. 31,No JOURNAL OF CHINA UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb

Comparison of Evapotranspiration between Indigenous Vegetation and Invading Vegetation in a Bog

Δημιουργία Δυαδικών Δέντρων Αναζήτησης

Δομζσ επιλογήσ ςτο SCRATCH

entailment Hoare triple Brotherston Brotherston

Ημερίδα διάχυσης αποτελεσμάτων έργου Ιωάννινα, 14/10/2015

Ολοκληρωμένη Πλατφόρμα Δικτύωσης της Δημόσιας Διοίκησης για την παροχή ενιαίων και εξατομικευμένων ηλεκτρονικών υπηρεσιών σε πολίτες και επιχειρήσεις»

GPU GPU GPU GPU. GPU (Graphics Processing Unit) GPU GPU GPU AGPU [11] AGPU. GPGPU (general-purpose GPU) GPU GPU AGPU GPU

Ηλεκτρονικές Πηγές: πεπραγμένα Άννα Φράγκου Μερσίνη Κακούρη Παναγιώτης Γεωργίου Μαρία Νταουντάκη. και. Πόπη Φλώρου Ελευθερία Κοσέογλου

The Algorithm to Extract Characteristic Chord Progression Extended the Sequential Pattern Mining

Query by Phrase (QBP) (Music Information Retrieval, MIR) QBH QBP / [1, 2] [3, 4] Query-by-Humming (QBH) QBP MIDI [5, 6] [8 10] [7]

Area Location and Recognition of Video Text Based on Depth Learning Method

ΔΙΑΧΕΙΡΙΣΗ ΠΕΡΙΕΧΟΜΕΝΟΥ ΠΑΓΚΟΣΜΙΟΥ ΙΣΤΟΥ ΚΑΙ ΓΛΩΣΣΙΚΑ ΕΡΓΑΛΕΙΑ. Information Extraction


Detection and Recognition of Traffic Signal Using Machine Learning

Ερευνητική+Ομάδα+Τεχνολογιών+ Διαδικτύου+

CONFIOUS: The Conference Nous Σύστημα Διαχείρισης Επιστημονικών & Ακαδημαϊκών Συνεδρίων. (

Indexing Methods for Encrypted Vector Databases

Probabilistic Approach to Robust Optimization

ΞΕΝΗ Ι. ΜΑΜΑΚΟΥ. Μέλος Ε.Τ.Ε.Π. Τμήμα Οργάνωσης και Διοίκησης Επιχειρήσεων Οικονομικού Πανεπιστημίου Αθηνών

ΓΡΑΜΜΙΚΟΣ & ΔΙΚΤΥΑΚΟΣ ΠΡΟΓΡΑΜΜΑΤΙΣΜΟΣ

Wiki. Wiki. Analysis of user activity of closed Wiki used by small groups

Συστήματα Διαχείρισης Βάσεων Δεδομένων

(C) 2010 Pearson Education, Inc. All rights reserved.

Τεχνικές Συµπίεσης Βίντεο. Δρ. Μαρία Κοζύρη Τµήµα Πληροφορικής Πανεπιστήµιο Θεσσαλίας

[1] DNA ATM [2] c 2013 Information Processing Society of Japan. Gait motion descriptors. Osaka University 2. Drexel University a)

ΔΙΠΛΩΜΑΤΙΚΕΣ ΕΡΓΑΣΙΕΣ

ΣΥΝΔΥΑΣΤΙΚΗ ΒΕΛΤΙΣΤΟΠΟΙΗΣΗ

Automatic extraction of bibliography with machine learning

Web-based supplementary materials for Bayesian Quantile Regression for Ordinal Longitudinal Data


2. N-gram IDF. DEIM Forum 2016 A1-1. N-gram IDF IDF. 5 N-gram. N-gram. N-gram. N-gram IDF.

BiCG CGS BiCGStab BiCG CGS 5),6) BiCGStab M Minimum esidual part CGS BiCGStab BiCGStab 2 PBiCG PCGS α β 3 BiCGStab PBiCGStab PBiCG 4 PBiCGStab 5 2. Bi

G. Kokkinankis, E. Dermatas, E. Coutsogeorgopoulos

Επεξεργασία Πολυµέσων. Δρ. Μαρία Κοζύρη Π.Μ.Σ. «Εφαρµοσµένη Πληροφορική» Τµήµα Ηλεκτρολόγων Μηχανικών & Μηχανικών Υπολογιστών Πανεπιστήµιο Θεσσαλίας

Επιτραπέζια μίξερ C LINE 10 C LINE 20

Based Modeling: Applications of Genetic Programming and Self-Organizing Maps.

New Adaptive Projection Technique for Krylov Subspace Method

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2014-MUS-104 No /8/26 1,a) Music Structure and Composition with Sound Directivity in 3D Space

Solving an Air Conditioning System Problem in an Embodiment Design Context Using Constraint Satisfaction Techniques

Binary32 (a hi ) 8 bits 23 bits Binary32 (a lo ) 8 bits 23 bits Double-Float (a=a hi +a lo, a lo 0.5ulp(a hi ) ) 8 bits 46 bits Binary64 11 bits sign

, Evaluation of a library against injection attacks

1 n-gram n-gram n-gram [11], [15] n-best [16] n-gram. n-gram. 1,a) Graham Neubig 1,b) Sakriani Sakti 1,c) 1,d) 1,e)

Filter Diagonalization Method which Constructs an Approximation of Orthonormal Basis of the Invariant Subspace from the Filtered Vectors

Resurvey of Possible Seismic Fissures in the Old-Edo River in Tokyo

Razor. [1], [2] (typical) LSI V/F. Razor. (Timing Fault: TF) [7] Razor [3], [4], [5] DVFS - Dynamic Voltage and Frequency Scaling [6]

Evolutive Image Coding

2. 3. OCaml. Scheme[13] do CPS. On optimization for recursive programs without tailcalls.

ΕΠΛ202:'Η'επιστημονική'δημοσίευση

Reading Order Detection for Text Layout Excluded by Image

Εργαστήριο 4_1 Λεκτική Ανάλυση - Flex Διδάσκοντες: Δρ. Γεώργιος Δημητρίου Δρ. Άχμεντ Μάχντι

ΒΑΣΙΚΕΣ ΠΡΟΓΡΑΜΜΑΤΙΣΤΙΚΕΣ ΔΟΜΕΣ ΣΕ ΨΕΥΔΟΚΩΔΙΚΑ ΚΑΙ ΣΕ ΔΡΠ

Διάλεξη 15: Δομές Δεδομένων IV (Διπλά Συνδεδεμένες Λίστες)

Εφαρμογή Υπολογιστικών Τεχνικών στην Γεωργία

Transcript:

Re-Pair 1 1 Re-Pair Re-Pair Re-Pair Re-Pair 1. Larsson Moffat [1] Re-Pair Re-Pair (Re-Pair) ( ) (highly repetitive text) [2] Re-Pair [7] Re-Pair Re-Pair n O(n) O(n) 1 Hokkaido University, Graduate School of Information Science and Technology, {tmasaki, kida}@ist.hokudai.ac.jp Re-Pair Wan Moffat [6] Re-Pair (Re-Merge) Re-Pair Re-Merge Re-Pair Sekine [4, 5, 8] Blocked-Repair-VF Maruyama [2] FOLCA FOLCA Sakamoto [3] LCA c 2015 Information Processing Society of Japan 1

[9] Re-Pair Re-Pair LT-RePair SemiOnlineReplace [10] LT-RePair SemiOnlineReplace Re-Pair [1] [7] 2. [10] LT-RePair SemiOnlineReplace 2.1 LT-RePair LT-RePair Re-Pair Re-Pair XY h(x) h(y ) (LeftTall) h(x) X LT-RePair CFG G σ α i0 α i1 α im 1 ( i k {0,, Σ + V 2}), a i (0 i < Σ ), α i α j α k (0 j, k < i h(α j ) h(α k )) (i Σ ). G X [1] Re-Pair n O(n) LT-RePair O(n) O(1) 2.2 SemiOnlineReplace SemiOnlineReplace D D LT-RePair SemiOnlineReplace 1 Algorithm 1 T n B B p h(p) p p.prev p.next NIL C(p) p p.next D RMQ(p) B p D NIL UpdateST RMQ Replace(p) p p.next D Output(p) B p n g ĝ SemiOnlineReplace O(n log ĝ) O(g) [10] c 2015 Information Processing Society of Japan 2

Algorithm 1 SemiOnlineReplace 1: procedure Main 2: T := T [1, n] 3: T [n + 1] dummy 4: B := 5: for i := 1, n + 1 do 6: B.append(T [i]) 7: last pos := B.tail 8: RecursiveReplace(B, last pos.prev) 9: end for 10: end procedure 11: procedure RecursiveReplace(B, p) 12: if p = NIL OR p.next = NIL then 13: return 14: end if 15: if h(p) > h(p.next) then 16: if h(p.prev) = h(p) AND C(p.prev) > C(p) then 17: UpdateST(p.prev) 18: else 19: UpdateST(p) 20: end if 21: return 22: else 23: if p.prev = NIL OR h(p.prev) = h(p) then 24: m p.prev 25: else 26: m RMQ(p.prev) 27: end if 28: while C(m) AND C(m) C(p) do 29: m m.next 30: Replace(m) 31: RecursiveReplace(m.prev) 32: RecursiveReplace(m) 33: if m = p then 34: return 35: end if 36: m RMQ(p.prev) 37: end while 38: if C(m) = AND C(p) = then 39: Output(p) 40: end if 41: end if 42: end procedure LT-RePair 3. LT-RePair (AdaptiveBlockExpand) LT-RePair SemiOnlineReplace ( 1 ) T m F ( 2 ) F LeftTall D ( 3 ) SemiOnlineReplace F ( 4 ) F F D (2) ( 5 ) T (1) m (4) Σ + D = 2 l l LT- RePair 3.1 Algorithm 2 T n F m D GetMaxPair(F ) F LeftTall NIL AddRule(D, p) D p Output(F, D) F D Replace(F, p) F p SemiOnlineReplace(F [m cb, m], T [f, n]) F [m cb, m] T [f, n] SemiOnlineReplace F T 2-3 f 4 while 5-8 LT-Repair SemiOnlineReplace T f m F c 2015 Information Processing Society of Japan 3

Algorithm 2 AdaptiveBlockExtend 1: procedure Main 2: T := T [1, n] 3: f := 1 4: while f n do 5: D := 6: F [1, m] = T [f, f + m 1] 7: f := f + m 8: p := GetMaxPair(F ) 9: while p NIL do 10: AddRule(D, p) 11: cb := Replace(F, p) 12: ct := SemiOnlineReplace(F [m cb, m], T [f, n], p) 13: f := f + ct 14: p := GetMaxPair(F ) 15: end while 16: Output(F, D) 17: end while 18: end procedure f f + m 9-15 while LT-RePair SemiOnlineReplace p =NIL LT-RePair cb F SemiOnlineReplace ct ct f while F D 4 3.2 b g ĝ Algorithm2 AddRule 1 GetMaxPair 1 Replace n Replace 1 Replace O(n) SemiOnlineReplace n ĝ SemiOnlineReplace O(n log ĝ ) Output O(n) 5-8 4 while n O(n) 9-15 while 1 2 while n while Replace SemiOnlineReplace O(n) O(n log ĝ ) LT- RePair SemiOnlineReplace 2 LT-RePair O(m) LT-RePair LT-RePair O(m) O(b ) LT-RePair O(m + b ) SemiOnlineReplace SemiOnlineReplace O(g ) O(m + g + b ) g b O(m+b ) 4. 4.1 Re-Pair Web Pizza&Chili Corpus (http://pizzachili.dcc.uchile.cl/index.html) 100MB DNA Re-Pair 1MB 5MB intel R Xeon (R) CPU E3-1225 V2@3.20GHz 15.6GiB c 2015 Information Processing Society of Japan 4

1 (%) (MB) (s) Re-Pair 31.7 1403 22.8 (5MB) 33.0 424 78.0 (1MB) 33.8 142 82.9 JSPS 15K00002 24240021 Ubuntu 12.04 LTS (64bit) C++ GCC (version 4.6.3) 4.2 1 Re-Pair 5MB 3 1MB 14 SemiOnlineReplace Re-Pair O(m + b ) b LT-RePair 5. Larsson Moffat [1] Re-Pair LT-RePair SemiOnlineReplace [10] m g b ĝ O(n log ĝ ) O(m + b ) Re-Pair LT-RePair [1] Larsson, N. J. and Moffat, A.: Offline Dictionary- Based Compression, Proceedings of the Data Compression Conference 1999 (DCC 99), IEEE Computer Society, pp. 296 305 (1999). [2] Maruyama, S., Tabei, Y., Sakamoto, H. and Sadakane, K.: Fully-online grammar compression, Proceedings of the 20th international conference on String processing and information retrieval (SPIRE 2013), pp. 218 229 (2013). [3] Sakamoto, H., Kida, T. and Shimozono, S.: A Space- Saving Linear-Time Algorithm for Grammar-Based Compression, String Processing and Information Retrieval, Lecture Notes in Computer Science, Vol. 3246, Springer Berlin / Heidelberg, pp. 218 229 (2004). [4] Sekine, K., Sasakawa, H., Yoshida, S. and Kida, T.: Variable-to-Fixed-Length Encoding for Large Texts Using Re-Pair Algorithm with Shared Dictionaries, Proceedings of the Data Compression Conference 2013 (DCC 2013), p. 518 (2013). [5] Sekine, K., Sasakawa, H., Yoshida, S. and Kida, T.: Adaptive Dictionary Sharing Method for Re-Pair Algorithm, Data Compression Conference (DCC), 2014, pp. 425 425 (online), DOI: 10.1109/DCC.2014.73 (2014). [6] Wan, R. and Moffat, A.: Block merging for off-line compression, Journal of American Society for Information Science and Technology, Vol. 58, No. 1, pp. 3 14 (online), DOI: 10.1002/asi.v58:1 (2007). [7] Yoshida, S. and Kida, T.: A Variable-length-to-fixedlength Coding Method Using a Re-Pair Algorithm, IPSJ Transactions on Databases, Vol. 6, No. 4, pp. 17 23 (2013). [8]. DE, Vol. 112, No. 346, pp. 47 52 http://ci.nii.ac.jp/naid/110009667169/ (2012). [9] VF. AL, Vol. 2014, No. 8, pp. 1 5 http://ci.nii.ac.jp/naid/110009785568/ (2014). [10] Re-Pair Vol. 115, No. 84, pp. 37 43 (2015). c 2015 Information Processing Society of Japan 5