Indexing Methods for Encrypted Vector Databases



Σχετικά έγγραφα
A Method for Creating Shortcut Links by Considering Popularity of Contents in Structured P2P Networks

Bundle Adjustment for 3-D Reconstruction: Implementation and Evaluation

GPU. CUDA GPU GeForce GTX 580 GPU 2.67GHz Intel Core 2 Duo CPU E7300 CUDA. Parallelizing the Number Partitioning Problem for GPUs

Quick algorithm f or computing core attribute

ER-Tree (Extended R*-Tree)


Homework 3 Solutions

Numerical Analysis FMN011

Schedulability Analysis Algorithm for Timing Constraint Workflow Models

Ανάκτηση Εικόνας βάσει Υφής με χρήση Eye Tracker

GPGPU. Grover. On Large Scale Simulation of Grover s Algorithm by Using GPGPU

Vol. 31,No JOURNAL OF CHINA UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb

Optimization, PSO) DE [1, 2, 3, 4] PSO [5, 6, 7, 8, 9, 10, 11] (P)

Other Test Constructions: Likelihood Ratio & Bayes Tests

Reminders: linear functions

Approximation of distance between locations on earth given by latitude and longitude

HOSVD. Higher Order Data Classification Method with Autocorrelation Matrix Correcting on HOSVD. Junichi MORIGAKI and Kaoru KATAYAMA

Newman Modularity Newman [4], [5] Newman Q Q Q greedy algorithm[6] Newman Newman Q 1 Tabu Search[7] Newman Newman Newman Q Newman 1 2 Newman 3

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

Gemini, FastMap, Applications. Εαρινό Εξάμηνο Τμήμα Μηχανικών Η/Υ και Πληροϕορικής Πολυτεχνική Σχολή, Πανεπιστήμιο Πατρών

Anomaly Detection with Neighborhood Preservation Principle

2 Composition. Invertible Mappings

FX10 SIMD SIMD. [3] Dekker [4] IEEE754. a.lo. (SpMV Sparse matrix and vector product) IEEE754 IEEE754 [5] Double-Double Knuth FMA FMA FX10 FMA SIMD

ΕΘΝΙΚΟ ΚΑΙ ΚΑΠΟΔΙΣΤΡΙΑΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΑΘΗΝΩΝ ΣΧΟΛΗ ΘΕΤΙΚΩΝ ΕΠΙΣΤΗΜΩΝ ΤΜΗΜΑ ΠΛΗΡΟΦΟΡΙΚΗΣ ΚΑΙ ΤΗΛΕΠΙΚΟΙΝΩΝΙΩΝ

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

Matrices and Determinants

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

w o = R 1 p. (1) R = p =. = 1

Research on Economics and Management

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

An Effective and Efficient Algorithm for Text Categorization

Re-Pair n. Re-Pair. Re-Pair. Re-Pair. Re-Pair. (Re-Merge) Re-Merge. Sekine [4, 5, 8] (highly repetitive text) [2] Re-Pair. Blocked-Repair-VF [7]

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

ΓΕΩΜΕΣΡΙΚΗ ΣΕΚΜΗΡΙΩΗ ΣΟΤ ΙΕΡΟΤ ΝΑΟΤ ΣΟΤ ΣΙΜΙΟΤ ΣΑΤΡΟΤ ΣΟ ΠΕΛΕΝΔΡΙ ΣΗ ΚΤΠΡΟΤ ΜΕ ΕΦΑΡΜΟΓΗ ΑΤΣΟΜΑΣΟΠΟΙΗΜΕΝΟΤ ΤΣΗΜΑΣΟ ΨΗΦΙΑΚΗ ΦΩΣΟΓΡΑΜΜΕΣΡΙΑ

b. Use the parametrization from (a) to compute the area of S a as S a ds. Be sure to substitute for ds!

Physical DB Design. B-Trees Index files can become quite large for large main files Indices on index files are possible.

Liner Shipping Hub Network Design in a Competitive Environment

TMA4115 Matematikk 3

ST5224: Advanced Statistical Theory II

ΚΒΑΝΤΙΚΟΙ ΥΠΟΛΟΓΙΣΤΕΣ

Test Data Management in Practice

CRASH COURSE IN PRECALCULUS

ΑΚΑΔΗΜΙΑ ΕΜΠΟΡΙΚΟΥ ΝΑΥΤΙΚΟΥ ΜΑΚΕΔΟΝΙΑΣ ΣΧΟΛΗ ΜΗΧΑΝΙΚΩΝ

n 1 n 3 choice node (shelf) choice node (rough group) choice node (representative candidate)

Tridiagonal matrices. Gérard MEURANT. October, 2008

ΔΙΠΛΩΜΑΤΙΚΕΣ ΕΡΓΑΣΙΕΣ

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

GridFTP-APT: Automatic Parallelism Tuning Mechanism for Data Transfer Protocol GridFTP

2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems

ΔΙΠΛΩΜΑΤΙΚΕΣ ΕΡΓΑΣΙΕΣ ΠΜΣ «ΠΛΗΡΟΦΟΡΙΚΗ & ΕΠΙΚΟΙΝΩΝΙΕς» OSWINDS RESEARCH GROUP

Simplex Crossover for Real-coded Genetic Algolithms

(Υπογραϕή) (Υπογραϕή) (Υπογραϕή)

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

[4] 1.2 [5] Bayesian Approach min-max min-max [6] UCB(Upper Confidence Bound ) UCT [7] [1] ( ) Amazons[8] Lines of Action(LOA)[4] Winands [4] 1

Web-based supplementary materials for Bayesian Quantile Regression for Ordinal Longitudinal Data

Ερευνητική+Ομάδα+Τεχνολογιών+ Διαδικτύου+

The challenges of non-stable predicates

Lecture 10 - Representation Theory III: Theory of Weights

Wiki. Wiki. Analysis of user activity of closed Wiki used by small groups

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

Homomorphism in Intuitionistic Fuzzy Automata

ΠΑΝΕΠΙΣΤΗΜΙΟ ΠΑΤΡΩΝ ΠΟΛΥΤΕΧΝΙΚΗ ΣΧΟΛΗ ΤΜΗΜΑ ΜΗΧΑΝΙΚΩΝ Η/Υ & ΠΛΗΡΟΦΟΡΙΚΗΣ. του Γεράσιμου Τουλιάτου ΑΜ: 697

Solution Series 9. i=1 x i and i=1 x i.

Toward a SPARQL Query Execution Mechanism using Dynamic Mapping Adaptation -A Preliminary Report- Takuya Adachi 1 Naoki Fukuta 2.

Problem Set 3: Solutions

Space-Time Symmetries

Θεωρία Πληροφορίας και Κωδίκων

Exercises 10. Find a fundamental matrix of the given system of equations. Also find the fundamental matrix Φ(t) satisfying Φ(0) = I. 1.

ΠΑΝΕΠΙΣΤΗΜΙΟ ΠΕΙΡΑΙΩΣ ΤΜΗΜΑ ΠΛΗΡΟΦΟΡΙΚΗΣ ΠΜΣ «ΠΡΟΗΓΜΕΝΑ ΣΥΣΤΗΜΑΤΑ ΠΛΗΡΟΦΟΡΙΚΗΣ» ΚΑΤΕΥΘΥΝΣΗ «ΕΥΦΥΕΙΣ ΤΕΧΝΟΛΟΓΙΕΣ ΕΠΙΚΟΙΝΩΝΙΑΣ ΑΝΘΡΩΠΟΥ - ΥΠΟΛΟΓΙΣΤΗ»

Solutions to Exercise Sheet 5

ΤΕΧΝΟΛΟΓΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΥΠΡΟΥ ΤΜΗΜΑ ΝΟΣΗΛΕΥΤΙΚΗΣ

Lecture 2. Soundness and completeness of propositional logic

Study of In-vehicle Sound Field Creation by Simultaneous Equation Method

Online Social Networks: Posts that can save lives. Sotiria Giannitsari April 2016

Online Social Networks: Posts that can save lives. Dimitris Gritzalis, Sotiria Giannitsari, Dimitris Tsagkarakis, Despina Mentzelioti April 2016

Εισαγωγή στα Πληροφοριακά Συστήματα. Ενότητα 11: Αρχιτεκτονική Cloud

ΓΙΑΝΝΟΥΛΑ Σ. ΦΛΩΡΟΥ Ι ΑΚΤΟΡΑΣ ΤΟΥ ΤΜΗΜΑΤΟΣ ΕΦΑΡΜΟΣΜΕΝΗΣ ΠΛΗΡΟΦΟΡΙΚΗΣ ΤΟΥ ΠΑΝΕΠΙΣΤΗΜΙΟΥ ΜΑΚΕ ΟΝΙΑΣ ΒΙΟΓΡΑΦΙΚΟ ΣΗΜΕΙΩΜΑ

Partial Differential Equations in Biology The boundary element method. March 26, 2013

ΑΠΟΔΟΤΙΚΗ ΑΠΟΤΙΜΗΣΗ ΕΡΩΤΗΣΕΩΝ OLAP Η ΜΕΤΑΠΤΥΧΙΑΚΗ ΕΡΓΑΣΙΑ ΕΞΕΙΔΙΚΕΥΣΗΣ. Υποβάλλεται στην

Durbin-Levinson recursive method

( ) 2 and compare to M.

Πανεπιστήμιο Κρήτης, Τμήμα Επιστήμης Υπολογιστών Άνοιξη HΥ463 - Συστήματα Ανάκτησης Πληροφοριών Information Retrieval (IR) Systems

ES440/ES911: CFD. Chapter 5. Solution of Linear Equation Systems

2002 Journal of Software

Security in the Cloud Era

Πανεπιστήµιο Πειραιώς Τµήµα Πληροφορικής

substructure similarity search using features in graph databases

Εφαρμογή Υπολογιστικών Τεχνικών στην Γεωργία

1) Formulation of the Problem as a Linear Programming Model

The IT Security Expert Profile

CYTA Cloud Server Set Up Instructions

No. 7 Modular Machine Tool & Automatic Manufacturing Technique. Jul TH166 TG659 A

Monolithic Crystal Filters (M.C.F.)

Feasible Regions Defined by Stability Constraints Based on the Argument Principle

Inverse trigonometric functions & General Solution of Trigonometric Equations

Fractional Colorings and Zykov Products of graphs

Transcript:

Computer Security Symposium 2013 21-23 October 2013 305-0006 1-1-1 junpei.kawamoto@acm.org LSH LSH LSH Indexing Methods for Encrypted Vector Databases Junpei Kawamoto Faculty of Engineering, Information and Systems, University of Tsukuba 1-1-1 Tennodai, Tsukuba, Ibaraki 305-0006, JAPAN junpei.kawamoto@acm.org Abstract We introduce a filtering methodology based on locality sensitive hashing (LSH) and whitening transformation to reduce candidate tuples which encrypted vector databases (EVDBs) must compute similarity between for query processing. LSH is a hashing methodology which is efficient for estimating similarities between two vectors. It hashes a vector space using randomly chosen vectors. We can filter vectors which are less similar to the querying vectors by recording which hashed space each vector belongs to. However, if vectors in EVDBs are found locally, then most vectors are in a same hashed space and so the filter will not work. Since we can treat those cases using whitening transformation to distribute the vectors broadly, our proposal filtering methodology will work effectively on any vector space. We also show that the server s query processing cost is reduced by our filter. 1 k v (k.v) 1-978 -

(LSH; locality sensitive hashing) [2] LSH [3, 4] LSH LSH 2 V DB(Key, V alue) Key V alue Key V alue Key V alue R- [1] α q sim(k, q) α k - 979 -

EV DB(Key e, V alue e ) Key e V alue e k Key k e Key e Enc k k e = Enc k (k) v V alue v e V alue e Enc v v e = Enc v (v) q α Enc q q e = Enc q (q) α sim(k e, q e ) α (k e, v e ) k, q v k e, q e v e (k e, v e ) Dec k Dec v k = Dec k (k e ) v = Dec v (v e ) Enc k, Enc q, Enc v, Dec k Dec v Enc q, Dec k Dec v (LSH) LSH LSH LSH 3.1 LSH [5, 2, 6] Charikar [2] LSH m h i b i 1; v b i 0 h i (v) = 0; otherwise v b i m v LSH lsh(v) lsh(v) = (h 1 (v), h 2 (v),, h m (v)). (1) u v LSH lsh(u) lsh(v) Pr[lsh(u) = 3 LSH lsh(v)] 1 θ(u, v)/π. Pr[lsh(u) = lsh(v)] lsh(u) lsh(v) h i (u) = h i (v) - 980 - i θ(u, v)

LSH cos(u, v) cos(u, v) cos (π(1 Pr[lsh(u) = lsh(v)])) (2) LSH m 2 m (2) m b i v LSH LSH m LSH 3.2 LSH LSH LSH Σ µ v Σ Σ = ΦΛΦ 1 Φ i Σ i Λ W k W k = ΦΛ 1/2. (4) v v w v w = Wk T (v µ) E(v w vw) T = E(Wk T (v µ)(v µ) T W k ) = E(Λ 1/2 Φ T ΣΦΛ 1/2 ) = I LSH LSH 4 3.3 2 Enc k Enc q Enc v Dec k Dec v LSH Enc k Enc q Dec k Enc k Enc q Dec k LSH V DB LSH k Key Σ = E ( (v µ)(v µ) T ) - 981 - (3)

µ (3) Σ = E ( (Enc k (k) µ)(enc k (k) µ) T ) Σ Σ = ΦΛΦ 1 (4) W k Enc k (k) Enc k (k) = W T k (Enc k(k) µ) Enc q Dec k Enc q(q) = W 1 k Enc q(q), Dec k (k e) = Dec k ((Wk T ) 1 k e + µ) V DB EV DB Enc q Dec k Dec v µ q α sim(k, q) α k k q α k e q e α µ Enc q (q) k e = Enc k (k) q e = Enc q(q) q α = α µ Enc q (q) k e q e α µ Enc q (q) k q α LSH m (1) k e LSH lsh(k e) LSH EV DB (LSH, Key e, V alue e ) LSH LSH S lsh LSH q e α α = α µ Enc q (q) LSH h q = lsh(q e) LSH S lsh S cand S lsh S cand LSH h S cand cos (π(1 Pr[h = h q ])) α (5) (2) q e (5) LSH α S cand LSH q e LSH S cand S cand LSH k e q e α 4 IPP [7] - 982 -

5000 size 10000 size 4000 min. max. 8000 min. max. 3000 6000 2000 4000 1000 2000 0 16 32 64 128 256 512 1024 the number of base vectors m. (a) n = 10000 0 16 32 64 128 256 512 1024 the number of base vectors m. (a) n = 10000 20000 15000 size min. max. 100000 size min. 80000 max. 10000 5000 60000 40000 20000 0 16 32 64 128 256 512 1024 the number of base vectors m. (b) n = 100000 0 16 32 64 128 256 512 1024 the number of base vectors m. (b) n = 100000 1: LSH LSH 2: LSH LSH n 1) LSH 2) LSH 3) LSH Python 2.7 Intel R Core TM i7-860 Processor (8M Cache, 2.80 GHz), 8GB RAM OS Ubuntu 12.04 LTS 1 n = 10000 n = 100000 LSH LSH m size (1) LSH LSH max. LSH min. LSH 1 1 1 LSH m LSH LSH m m LSH n 2 1 min. max. - 983 -

LSH size m 1 LSH LSH max. m < 64 min. n LSH m 64 min. LSH 1 2 3 LSH m 3(a) recall recall 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 16 32 64 128 256 1024 the number of base vectors m (a) 1 10 20 30 40 50 requesting width (b) (m = 256) 3: (n = 1000). n = 10000 n = 100000 LSH m 3 m LSH n = 10000 4(a) m > 128 LSH m 3(b) m = 256 n = 100000 m = 512 4 m - 984 - LSH

10 0 m = 16 m = 32 10-1 m = 64 m = 128 m = 256 10-2 10-3 w/o lsh filter 3 10-4 10 20 30 40 50 requesting width (a) n = 10000 10 0 m = 32 m = 64 10-1 m = 128 m = 256 m = 512 10-2 10-3 w/o lsh filter 10-4 10 20 30 40 50 requesting width (b) n = 100000 4: (sec). m = 256 5 (LSH) R 4 1 2 LSH 4 LSH m m (2) - 985 - [1] Wang, J., Wu, S., Gao, H., Li, J., Ooi, B.C.: Indexing Multi-dimensional Data in a Cloud System. In: Proc. of the 30th ACM SIGMOD International Conference on Management of Data, pp. 591 602. ACM Press, Indianapolis, IN, USA (2010) [2] Charikar, M.S.: Similarity Estimation Techniques from Rounding Algorithms. In: Proc. of the 34th Annual ACM Symposium on Theory of Computing, pp. 380 388. ACM Press, Montreal, Quebec, Canada (2002) [3] Kirsch, A., Mitzenmacher, M.: Distance- Sensitive Bloom Filters. In: The 18th Workshop on Algorithm Engineering and Experiments. Miami, FL, USA (2006) [4] Hua, Y., Xiao, B., Veeravalli, B., Feng, D.: Locality-Sensitive Bloom Filter for Approximate Membership Query. IEEE Transactions on Computers 61(6), 817 830 (2011) [5] Gionis, A., Indyk, P., Motwani, R.: Similarity Search in High Dimensions via Hashing. In: Proc. of the 25th International Conference on Very Large Data Bases, pp. 518 529. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1999) [6] Kulis, B., Grauman, K.: Kernelized Locality- Sensitive Hashing for Scalable Image Search. In: Proc. of the 12th IEEE International Conference on Computer Vision, pp. 2130 2137. IEEE Computer Society, Kyoto, Japan (2009) [7] Kawamoto, J., Yoshikawa, M.: Private Range Query by Perturbation and Matrix Based Encryption. In: Proc. of the Sixth IEEE International Conference on Digital Information Management, pp. 211 216. IEEE Computer Society, Melbourne, Australia (2011)