Practical Implementation of Compressed Suffix Array on Modern Processors

Σχετικά έγγραφα
A data structure based on grammatical compression to detect long pattern

Re-Pair n. Re-Pair. Re-Pair. Re-Pair. Re-Pair. (Re-Merge) Re-Merge. Sekine [4, 5, 8] (highly repetitive text) [2] Re-Pair. Blocked-Repair-VF [7]

FX10 SIMD SIMD. [3] Dekker [4] IEEE754. a.lo. (SpMV Sparse matrix and vector product) IEEE754 IEEE754 [5] Double-Double Knuth FMA FMA FX10 FMA SIMD

2. N-gram IDF. DEIM Forum 2016 A1-1. N-gram IDF IDF. 5 N-gram. N-gram. N-gram. N-gram IDF.

Reducing the space and time requirements of LZ-index using the XBW transformation jργklαzxcvbnβφδγωmζqwertλκοθξyu

GPU. CUDA GPU GeForce GTX 580 GPU 2.67GHz Intel Core 2 Duo CPU E7300 CUDA. Parallelizing the Number Partitioning Problem for GPUs

GPU GPU GPU GPU. GPU (Graphics Processing Unit) GPU GPU GPU AGPU [11] AGPU. GPGPU (general-purpose GPU) GPU GPU AGPU GPU

ER-Tree (Extended R*-Tree)


[4] 1.2 [5] Bayesian Approach min-max min-max [6] UCB(Upper Confidence Bound ) UCT [7] [1] ( ) Amazons[8] Lines of Action(LOA)[4] Winands [4] 1

Toward a SPARQL Query Execution Mechanism using Dynamic Mapping Adaptation -A Preliminary Report- Takuya Adachi 1 Naoki Fukuta 2.

Elements of Information Theory

Quick algorithm f or computing core attribute

Evolutive Image Coding

substructure similarity search using features in graph databases

and algorithms CONTENTS Process for Design and Analysis of Algorithms Understanding the Problem

A Method for Creating Shortcut Links by Considering Popularity of Contents in Structured P2P Networks

GPGPU. Grover. On Large Scale Simulation of Grover s Algorithm by Using GPGPU

Newman Modularity Newman [4], [5] Newman Q Q Q greedy algorithm[6] Newman Newman Q 1 Tabu Search[7] Newman Newman Newman Q Newman 1 2 Newman 3

Ανάκτηση Πληροφορίας

HOSVD. Higher Order Data Classification Method with Autocorrelation Matrix Correcting on HOSVD. Junichi MORIGAKI and Kaoru KATAYAMA

Efficient Top-k Search for Random Walk with Restart

MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)

Indexing Methods for Encrypted Vector Databases

Yoshifumi Moriyama 1,a) Ichiro Iimura 2,b) Tomotsugu Ohno 1,c) Shigeru Nakayama 3,d)

Anomaly Detection with Neighborhood Preservation Principle

2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems

Text Mining using Linguistic Information

Gemini, FastMap, Applications. Εαρινό Εξάμηνο Τμήμα Μηχανικών Η/Υ και Πληροϕορικής Πολυτεχνική Σχολή, Πανεπιστήμιο Πατρών

Πανεπιστήμιο Κρήτης, Τμήμα Επιστήμης Υπολογιστών Άνοιξη HΥ463 - Συστήματα Ανάκτησης Πληροφοριών Information Retrieval (IR) Systems

Δομές Δεδομένων. Δημήτρης Μιχαήλ. Συμβολοσειρές. Τμήμα Πληροφορικής και Τηλεματικής Χαροκόπειο Πανεπιστήμιο

Ανάκτηση Πληροφορίας. Διδάσκων: Φοίβος Μυλωνάς. Διάλεξη #03

GridFTP-APT: Automatic Parallelism Tuning Mechanism for Data Transfer Protocol GridFTP

Kenta OKU and Fumio HATTORI


Speeding up the Detection of Scale-Space Extrema in SIFT Based on the Complex First Order System

CUDA FFT. High Performance 3-D FFT in CUDA Environment. Akira Nukada, 1, 2 Yasuhiko Ogata, 1, 2 Toshio Endo 1, 2 and Satoshi Matsuoka 1, 2, 3

GPU DD Double-Double 3 4 BLAS Basic Linear Algebra Subprograms [3] 2

Japanese Fuzzy String Matching in Cooking Recipes

Ανάκτηση Πληροφορίας

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

Εφαρμογές της Θεωρίας της Πληροφορίας σε διαδικασίες ανάκτησης εικόνας

(C) 2010 Pearson Education, Inc. All rights reserved.

2002 Journal of Software

(Statistical Machine Translation: SMT [1])

CMOS Technology for Computer Architects

, Evaluation of a library against injection attacks

ΤΕΧΝΙΚΕΣ ΑΥΞΗΣΗΣ ΤΗΣ ΑΠΟΔΟΣΗΣ ΤΩΝ ΥΠΟΛΟΓΙΣΤΩΝ I

Επερωτήσεις σύζευξης με κατάταξη

{takasu, Conditional Random Field

Optimization, PSO) DE [1, 2, 3, 4] PSO [5, 6, 7, 8, 9, 10, 11] (P)

Επεξεργασία Πολυµέσων. Δρ. Μαρία Κοζύρη Π.Μ.Σ. «Εφαρµοσµένη Πληροφορική» Τµήµα Ηλεκτρολόγων Μηχανικών & Μηχανικών Υπολογιστών Πανεπιστήµιο Θεσσαλίας

Stabilization of stock price prediction by cross entropy optimization

ΑΛΓΟΡΙΘΜΟΙ Άνοιξη I. ΜΗΛΗΣ

ΠΛΕ- 074 Αρχιτεκτονική Υπολογιστών 2

Απόστολος Παπαδόπουλος Αριστοτέλειο Πανεπιστήµιο Θεσσαλονίκης Σχολή Θετικών Επιστηµών Τµήµα Πληροφορικής. Ακαδηµαϊκό Έτος

Τεχνικές Συµπίεσης Βίντεο. Δρ. Μαρία Κοζύρη Τµήµα Πληροφορικής Πανεπιστήµιο Θεσσαλίας

Lab 1: C/C++ Pointers and time.h. Panayiotis Charalambous 1

ΜΥΕ003: Ανάκτηση Πληροφορίας. Διδάσκουσα: Ευαγγελία Πιτουρά Κεφάλαιο 5: Στατιστικά Συλλογής. Συμπίεση.

Δομές Ευρετηρίου: Διάρθρωση Διάλεξης

Ενσωµατωµένα Υπολογιστικά Συστήµατα (Embedded Computer Systems)

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

Δομές Ευρετηρίου: Διάρθρωση Διάλεξης

An Effective and Efficient Algorithm for Text Categorization

Ανάκτηση Πληροφορίας Εισαγωγή

Τοποθέτηση τοπωνυµίων και άλλων στοιχείων ονοµατολογίας στους χάρτες

Εργαστήριο Οργάνωσης Η/Υ. Δαδαλιάρης Αντώνιος

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

Block Ciphers Modes. Ramki Thurimella

Ανάκληση Πληποφοπίαρ. Information Retrieval. Διδάζκων Δημήηριος Καηζαρός

Δομές Ευρετηρίου: Διάρθρωση Διάλεξης

XML. Light-weight acceleration for streaming XML document filtering. Shuichi MITARAI, Akira ISHINO, and Masayuki TAKEDA

FORTRAN & Αντικειμενοστραφής Προγραμματισμός ΣΝΜΜ 2017

Retrieval of Seismic Data Recorded on Open-reel-type Magnetic Tapes (MT) by Using Existing Devices

IT & Networking DEVELOPING Essential Python 3. Κωδικός Σεμιναρίου / Code

Automatic extraction of bibliography with machine learning

Probabilistic Approach to Robust Optimization

Supplementary Materials for Evolutionary Multiobjective Optimization Based Multimodal Optimization: Fitness Landscape Approximation and Peak Detection

Ανάκτηση Πληροφορίας

Ερευνητική+Ομάδα+Τεχνολογιών+ Διαδικτύου+

Instruction Execution Times

Processor-Memory (DRAM) ιαφορά επίδοσης

Παρουσίαση 2 ης Άσκησης:

Αρχιτεκτονική υπολογιστών

FPGA. Fast and Efficient Tsunami Propagation Simulation with FPGA and GPGPU

Πανεπιστήμιο Κρήτης, Τμήμα Επιστήμης Υπολογιστών Άνοιξη HΥ463 - Συστήματα Ανάκτησης Πληροφοριών Information Retrieval (IR) Systems

Ανάκτηση Πληροφορίας. Φροντιστήριο 2

Dynamic Data Type Refinement Methodology

ΓΡΑΜΜΙΚΟΣ & ΔΙΚΤΥΑΚΟΣ ΠΡΟΓΡΑΜΜΑΤΙΣΜΟΣ

Ανάκληση Πληροφορίας. Information Retrieval. Διδάσκων Δημήτριος Κατσαρός

Web. Web p OutDegree(p) log 7 1/OutDegree(p) A New Difinition of Subjective Distance between Web Pages

Πανεπιστήμιο Δυτικής Μακεδονίας. Τμήμα Μηχανικών Πληροφορικής & Τηλεπικοινωνιών. Τεχνητή Νοημοσύνη. Ενότητα 2: Αναζήτηση (Search)

Fourier transform, STFT 5. Continuous wavelet transform, CWT STFT STFT STFT STFT [1] CWT CWT CWT STFT [2 5] CWT STFT STFT CWT CWT. Griffin [8] CWT CWT

(Υπογραϕή) (Υπογραϕή) (Υπογραϕή)

ΑΛΓΟΡΙΘΜΟΙ Άνοιξη I. ΜΗΛΗΣ

Πανεπιστήμιο Κρήτης, Τμήμα Επιστήμης Υπολογιστών Άνοιξη HΥ463 - Συστήματα Ανάκτησης Πληροφοριών Information Retrieval (IR) Systems

Ορθότητα Χωρική αποδοτικότητα. Βελτιστότητα. Θεωρητική ανάλυση Εμπειρική ανάλυση. Αλγόριθμοι - Τμήμα Πληροφορικής ΑΠΘ -4ο εξάμηνο 1

Wavelet based matrix compression for boundary integral equations on complex geometries

Εργαστήριο 7: Ο αλγόριθμος ταξινόμησης Radix Sort

Homomorphism in Intuitionistic Fuzzy Automata

Transcript:

DEIM Forum 2012 F11-2 CPU NTT, 239-0847 1-1 E-mail: {yamamuro.takeshi,onizuka.makoto,hitaka.toshio,yamamuro.masashi}@lab.ntt.co.jp T N P M N >> M / T / CPU CPU 2 P CPU CPU Practical Implementation of Compressed Suffix Array on Modern Processors Takeshi YAMAMURO, Makoto ONIZUKA, Toshio HITAKA, and Masashi YAMAMURO NTT Cyber Space Laboratories, 1-1 Hikarino-o-ka, Yokosuka Kanagawa, 239-0847 Japan E-mail: {yamamuro.takeshi,onizuka.makoto,hitaka.toshio,yamamuro.masashi}@lab.ntt.co.jp 1. Web q-gram / 1 I/O q-gram / Compressed Suffix Array CSA S[0...N 1] = s 0s 1s 2...s N 1 P [0...M 1] = p 0p 1p 2...p M 1 / Suffix Array SA S[i] = {s i s i Σ} P [i] = {p i p i Σ} Σ N >> M Σ 1Byte ASCII S T SA [1] [2] SA P 2 Θ(MlogN) [3] [4] [5] SA SA / Θ(MlogN) (word RAM

1 CPU in-memory Xeon 5670 in-memory Fig.1 Stride Size 4Byte CPU x228 SA 2 1 in-memory CSA 3 / 2 TREC Terabyte Track.gov2 2004.gov TREC 2009 Million Query Track 40,000 4 P Best Worst CSA SA SA CSA in-memory / 2 CSA 2 1 2 SA CSA P Θ(logN) SA SA P M CSA CPU (1) 2 (2) CSA P CPU Section 2 SA SA Section 3 CSA CPU CPU Section 4 CPU 2 SA Section 5 Section 6/7/8 2. 2. 1 T [0...N 1] = t 0 t 1 t 2...t N 1 T [N] = $ $ T S[i](i = 0, 1, 2,..., N) S[i] = T [i...n] SA S[SA[i]] < S[SA[j]] iff i < j (1) 2 CSA SA / CSA 1 N logn-bit / 2 http://code.google.com/p/libdivsufsort/ 3 http://code.google.com/p/csalib/ 4 http://trec.nist.gov/data/million.query09.html SA P [0...M 1] = p 0 p 1 p 2...p M 1 P S T P S 2 T P S Θ(M) Θ(logN) SA Θ(MlogN) 2. 2 rank/select B[0...N 1] = b 0b 1b 2...b N 1 B[i] = {b i b i {0, 1}} bit B rank 1(B, i) B[0...i] 1 select 1(B, i) (i + 1) 1 rank 0 select 0

rank/select bit N o(n) O(logN) O(N) O(1) [7] rank/select 2 bit B T [0...N 1] = t 0t 1t 2...t N 1 T [i] = {t i t i Σ} rank/select rank x(t, i) T [0...i] x x Σ select x(t, i) (i + 1) x Wavelet [4] bit B rank/select LOUDS [8]/BP [9]/DFUDS [10] [11] 2. 3 rank/select SA SA 1 4Byte SA 4N-Byte 5 CSA SA ψ[i] ψ[i] = SA 1 [SA[i] + 1] (2) SA 1 SA[j] = i SA 1 [i] = j ψ SA T BWT [12] BWT BW T [i] = T [SA[i] 1] (3) BTW BWT SA ψ SA 1 [SA[i] + 1] = select x (BW T, i cum[x]) (4) cum[x] T x SA CSA P [3] [4] [5] 5 0.20N-Byte N-Byte [6] 3. CSA / CSA / 2 rank/select rank/select [3] [4] [5] CPU CPU 1 CSA CPU Valgrind 1/ 2 3 2 3 valgrind SA/CSA CPU Ins.-refs Data-refs CPU 2 L1-misses L2-misses L1/L2 2 CPU L1/L2 2 SA CSA 1 CPU 4. 4. 1 CSA CSA CPU 4 SA N ψ K BL BL i(i = 1...K) BL 1 BL 2 F 1 F ψ BL ψ ψ ψ rank/select 1. 2 CSA SA S P 2 rank/select CPU

SA S L CPU 2. BL 2 4 Section 4.2 SA S Section 4.3 4. 2 2 4 ψ p i =Pr(ψ[i]) F P j =Pr(F [j]) i = 0...N 1 j = 1...K Pr(A) A P ψ ψ K BL BL i > BL j iff P j < P i (5) BL i BL i BL j BL i BL i BL j log( BL i ) BL i =N*P i 2 ψ log( F )+log(min( BL i )) log( F )+log(max( BL i )) 2 log(n) F Section 4.4 4. 3 5 L Section 4.1 CSA 2 SA SA S P SA S P F S 5 2 Section 4.2 F S S L P L F 2 P S L CSA S L Section 4.5 4. 4 F Algorithm 1 Pseudo code to generate F 1: /* 2: suffix: Ordered suffixes 3: nref: Array of counters 4: chunksz: Number of ψ sharing a counter 5: pr: Array of reference probabilities in CHUNK 6: ratio: Array of sizes allocated in F 7: B: Bit array to map F with suffix 8: */ 9: pr = calc probabilities(nref); 10: ratio = allocate F(pr, sizeof(f )); 11: for i 1 to sizeof(ratio) do 12: for j 1 to ratio[i] do 13: set bit(b, i * refsz + j * (refsz/ratio[i])); 14: end for 15: end for 16: for i 1 to sizeof(f ) do 17: push back(f, TRANSLATE(suffix[select 1 (B, i)])); 18: end for F Algorithm.1 ψ nref ψ chunksz ψ 1 ψ CHUNK nref CHUNK F line 9-10) CHUNK F ψ CHUNK line 9 F CHUNK line 10 CHUNK F

4 B suffix line 13 BL B F line 17 Section 4.2 SA SA S L line 17 TRANSLATE Section 5.2 4. 5 L L L = 4 Algorithm 2 1Byte L = 4 val 4Byte 4Byte 4 1Byte 1 4Byte Byte 1Byte line 3-6 F Algorithm 2 Pseudo code to translate characters (L=4) 1: /* c 0 3 : input characters from 0-th to 4-th ones */ 2: val = 0; 3: val = c 0 << 24; 4: val = c 1 << 16; 5: val = c 2 << 8; 6: val = c 3 ; 7: return val; P 2 Algorithm 3 p Algorithm 2 L line 8 F pint 2 line 11-16 Section 4.3 line 17 L S line 18 F 2 line 11-16 CPU rank/select CPU 5. SA/CSA Section 1 2 CSA Practical CSA pcsa) CSA ψ T ψ γ dag vector 6 T LZ77 LZ-End [17] 6 https://github.com/pfi/dag vector Algorithm 3 Pseudo code to traverse F 1: /* 2: p: Input patten 3: pint: Translated pattern 4: cpos: Position of current searches 5: suffix: Ordered suffixes 6: B: Bit array generated in Algorithm 1 7: */ 8: pint = TRANSLATE(p); 9: len = sizeof(f ) / 2; 10: while len = 0 do 11: len = len / 2; 12: if pint < F [cpos] then 13: cpos -= len; 14: else if pint > F [cpos] then 15: cpos += len; 16: end if 17: if pint == F [cpos] then 18: if p < suffix[select 1 (B, cpos)] then 19: cpos -= len; 20: else 21: cpos += len; 22: end if 23: end if 24: end while 25: return cpos; Section 2.2 rank/select Section 4.2 L=4 4Byte F SA/CSA CPU CPU Section 1 2 TREC Terabyte Track.gov2 2GiB TREC 2009 Million Query Track Xeon 5670 16GiB/CPU 1/Intel Hyper-Threading 6/ 31.8GiB/s CPU oprofile v0.9.6 Xeon 5670 Xeon 5260 16GiB/CPU 1/ 2/ 21.2GiB/s C/C++ GNU Compiler Collection v4.1.2 -O2 5. 1 γ LZ-End pcsa γ LZ-End

rank/select ψ dag vector 254.4µs LZ-End 1 Deterioration Rate LZ-End P M O(M) [17] F P SA 2 CSA pcsa F CSA 1 LZ-End µs Pattern Length 1 4 8 12 16 LZ-End 1.20 3.15 6.15 8.34 9.60 Uncompressed 0.455 0.461 0.654 0.695 0.702 Deterioration Ratio x2.64 x6.84 x9.40 x11.99 x13.75 log( F ) 2 T F P log( F ) 5. 2 6 6 Intel x86 CPU CPU 64bit rdtsc F 2MiB/8MiB/32MiB pcsa 1.17GiB 3% P F F =32MiB 13.07µs P F 2 rank/select CPU 3.37µs SA 5. 3 CPU oprofile CPU 7 branch penalties / stall time / complete instructions 3 # of instructions 7 oprofile CPU CPU CPU 8 CSA F =32MiB 2 SA pcsa 1/5 CPU CPU [22] [20] [21] CPU 8 Memory Consumed CPU 8 Throughtput 6. 2 1 CPU 2 (1) DB Web

2 CPU CPU PForDelta [13] PForDelta OPTPForDelta [14] Simple9/16 [15] [14] VSEncoding [16] (2) CSA CSA CPU 7. 2 1 F 2 F 2 CPU ψ ψ CHUNK F (1) F F 2 (2) DB in-memory L1/L2 [20] [19] / Intel FAST [20] 100 10 6 8 8. N T M P N >> M CSA / CSA rank/select CPU SA 3% SA/CSA /Throughput [1] Daisuke Okanohara and Jun-ichi Tsujii Text Categorization with All Substring Features, Proc. of SIAM 09, pp. 838-846, 2009. [2] Choon Hui Teo and S. V. N. Vishwanathan Fast and space efficient string kernels using suffix arrays, International Conference on Machine Learning, pp. 929-936, 2006. [3] Kunihiko Sadakane New text indexing functionalities of the compressed suffix arrays, Journal of Algorithms, Vol. 48, Issue. 2, pp. 294-313, 2003. [4] Roberto Grossi and Ankur Gupta High-order entropycompressed text indexes, Proc. of SODA 03, pp. 841-850, 2003. [5] Roberto Grossi and Jeffery Scott Vitter Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String, SIAM Journal on Computing, Vol. 35, Issue. 2, pp. 378-407, 2005. [6] Baeza-Yates, R. and Ribeiro, B Modern Information Retrieval, Addison-Wesley, 1999. [7] Daisuke Okanohara and Kunihiko Sadakane Practical Entropy-Compressed Rank/Select Dictionaly, Proc. of ALENEX 07, pp. 60-70, 2007. [8] O Neil Delpratt, Naila Rahman, and Rajeev Raman Engineering the LOUDS Succinct Tree Representation, Proc. of WEA 06, pp. 134-145, 2006. [9] Richard F. Geary et al. A simple optimal representation for balanced parentheses, Journal of Theoretical Computer Science, Vol. 368, Issue. 3, pp. 231-246, 2006. [10] David Benoit et al. Representing Trees of Higher Degree, Journal of Algorithmica, Vol. 43, Issue. 4, pp. 275-292, 2005. [11] Arash Farzan and Johannes Fischer Compact Representation of Posets, Proc. of ISAAC 11, pp. 302-311, 2011. [12] Michael Burrows and David Wheeler A block-sorting lossless data compression algorithm, Technical Report 124, 1994. [13] Marcin Zukowski et al. Super-Scalar RAM-CPU Cache Compression, Proc. of ICDE 06, pp. 59-71, 2006. [14] Hao Yan, Shuai Ding, and Torsten Suel Inverted index compression and query processing with optimized document ordering, Proc. of WWW 11, pp. 401-410, 2009. [15] Vo Ngoc Anh and Alistair Moffat. Inverted Index Compression Using Word-Aligned Binary Codes, Journal of Information Retrieval, Vol. 8, Issue. 1, pp. 151-166, 2005. [16] Fabrizio Silvestri and Rossano Venturini VSEncoding: efficient coding and fast decoding of integer lists via dynamic programming, Proc. of CIKM 10, pp. 1219-1228, 2010. [17] Sebastian Kreft and Gonzalo Navarro LZ77-Like Compression with Fast Random Access, Proc. of DCC 10, pp. 239-248, 2010. [18] Pawel Gawrychowski Pattern matching in lempel-ziv compressed strings: fast, simple, and deterministic, Proc. of ESA 11, pp. 421-432, 2011. [19] Jason Sewall et al. PALM: Parallel Architecture-Friendly Latch-Free Modification to B+Trees on Many-Core Processors, Proc. of VLDB 11, 2011. [20] Changkyu Kim et al. Designing Fast Architecture Sensitive Tree Search on Modern Multi-Core/Many-Core Processors, ACM Transactions on Database Systems, 9(4), 2011. [21] Nadathur Satish et al. Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort, Proc. of SIGMOD 10, 2010. [22] Reilly Matthew When Multicore Isn t Enough: Trends and the Future for Multi-Multicore Systems, In HPEC, 2008.