Max /Min Online Aggregation in the Cloud

Σχετικά έγγραφα
Επερωτήσεις σύζευξης με κατάταξη

ER-Tree (Extended R*-Tree)

Quick algorithm f or computing core attribute

Vol. 31,No JOURNAL OF CHINA UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb

Schedulability Analysis Algorithm for Timing Constraint Workflow Models

Buried Markov Model Pairwise

ST5224: Advanced Statistical Theory II

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

A Method for Creating Shortcut Links by Considering Popularity of Contents in Structured P2P Networks

SCITECH Volume 13, Issue 2 RESEARCH ORGANISATION Published online: March 29, 2018

2 Composition. Invertible Mappings

Εργαστήριο Ανάπτυξης Εφαρμογών Βάσεων Δεδομένων. Εξάμηνο 7 ο

Πανεπιστήμιο Πειραιώς Τμήμα Πληροφορικής Πρόγραμμα Μεταπτυχιακών Σπουδών «Πληροφορική»

Re-Pair n. Re-Pair. Re-Pair. Re-Pair. Re-Pair. (Re-Merge) Re-Merge. Sekine [4, 5, 8] (highly repetitive text) [2] Re-Pair. Blocked-Repair-VF [7]

Congruence Classes of Invertible Matrices of Order 3 over F 2

SEMANTIC DATA CACHING AND REPLACEMENT

Elements of Information Theory

Study on the Strengthen Method of Masonry Structure by Steel Truss for Collapse Prevention

[4] 1.2 [5] Bayesian Approach min-max min-max [6] UCB(Upper Confidence Bound ) UCT [7] [1] ( ) Amazons[8] Lines of Action(LOA)[4] Winands [4] 1

Homework 3 Solutions

Inverse trigonometric functions & General Solution of Trigonometric Equations

«Χρήσεις γης, αξίες γης και κυκλοφοριακές ρυθμίσεις στο Δήμο Χαλκιδέων. Η μεταξύ τους σχέση και εξέλιξη.»

Reading Order Detection for Text Layout Excluded by Image

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

ΕΙΣΑΓΩΓΗ ΣΤΗ ΣΤΑΤΙΣΤΙΚΗ ΑΝΑΛΥΣΗ

Research on Economics and Management

Fourier Series. MATH 211, Calculus II. J. Robert Buchanan. Spring Department of Mathematics

«ΑΓΡΟΤΟΥΡΙΣΜΟΣ ΚΑΙ ΤΟΠΙΚΗ ΑΝΑΠΤΥΞΗ: Ο ΡΟΛΟΣ ΤΩΝ ΝΕΩΝ ΤΕΧΝΟΛΟΓΙΩΝ ΣΤΗΝ ΠΡΟΩΘΗΣΗ ΤΩΝ ΓΥΝΑΙΚΕΙΩΝ ΣΥΝΕΤΑΙΡΙΣΜΩΝ»

C.S. 430 Assignment 6, Sample Solutions

Uniform Convergence of Fourier Series Michael Taylor

Nov Journal of Zhengzhou University Engineering Science Vol. 36 No FCM. A doi /j. issn

Test Data Management in Practice

Εισαγωγή. Ρόλοι και τύποι cloud. Ορισμός και σύγκριση.

ΔΙΑΧΕΊΡΙΣΗ ΡΟΏΝ ΔΕΔΟΜΈΝΩΝ

User Behavior Analysis for a Large2scale Search Engine

Homomorphism in Intuitionistic Fuzzy Automata

ΤΕΧΝΟΛΟΓΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΥΠΡΟΥ ΤΜΗΜΑ ΝΟΣΗΛΕΥΤΙΚΗΣ

Gemini, FastMap, Applications. Εαρινό Εξάμηνο Τμήμα Μηχανικών Η/Υ και Πληροϕορικής Πολυτεχνική Σχολή, Πανεπιστήμιο Πατρών

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

,,, (, ) , ;,,, ; -

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

Σχεδιασμός Βάσεων Δεδομένων

ΣΔΥΝΟΛΟΓΗΚΟ ΔΚΠΑΗΓΔΤΣΗΚΟ ΗΓΡΤΜΑ ΗΟΝΗΧΝ ΝΖΧΝ «ΗΣΟΔΛΗΓΔ ΠΟΛΗΣΗΚΖ ΔΠΗΚΟΗΝΧΝΗΑ:ΜΔΛΔΣΖ ΚΑΣΑΚΔΤΖ ΔΡΓΑΛΔΗΟΤ ΑΞΗΟΛΟΓΖΖ» ΠΣΤΥΗΑΚΖ ΔΡΓΑΗΑ ΔΤΑΓΓΔΛΗΑ ΣΔΓΟΤ

MathCity.org Merging man and maths

Approximation of distance between locations on earth given by latitude and longitude

Srednicki Chapter 55

4.6 Autoregressive Moving Average Model ARMA(1,1)

Practice Exam 2. Conceptual Questions. 1. State a Basic identity and then verify it. (a) Identity: Solution: One identity is csc(θ) = 1

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

Homework for 1/27 Due 2/5

«ΑΝΑΠΣΤΞΖ ΓΠ ΚΑΗ ΥΩΡΗΚΖ ΑΝΑΛΤΖ ΜΔΣΔΩΡΟΛΟΓΗΚΩΝ ΓΔΓΟΜΔΝΩΝ ΣΟΝ ΔΛΛΑΓΗΚΟ ΥΩΡΟ»

2002 Journal of Software

1. A fully continuous 20-payment years, 30-year term life insurance of 2000 is issued to (35). You are given n A 1

ΠΤΥΧΙΑΚΗ ΕΡΓΑΣΙΑ "ΠΟΛΥΚΡΙΤΗΡΙΑ ΣΥΣΤΗΜΑΤΑ ΛΗΨΗΣ ΑΠΟΦΑΣΕΩΝ. Η ΠΕΡΙΠΤΩΣΗ ΤΗΣ ΕΠΙΛΟΓΗΣ ΑΣΦΑΛΙΣΤΗΡΙΟΥ ΣΥΜΒΟΛΑΙΟΥ ΥΓΕΙΑΣ "

þÿ Ç»¹º ³µÃ ± : Ãż²» Ä Â

( ) , ) , ; kg 1) 80 % kg. Vol. 28,No. 1 Jan.,2006 RESOURCES SCIENCE : (2006) ,2 ,,,, ; ;

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 24/3/2007

ΙΩΑΝΝΗ ΑΘ. ΠΑΠΑΪΩΑΝΝΟΥ

Solution Concepts. Παύλος Στ. Εφραιµίδης. Τοµέας Λογισµικού και Ανάπτυξης Εφαρµογών Τµήµα Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών

Στο εστιατόριο «ToDokimasesPrinToBgaleisStonKosmo?» έξω από τους δακτυλίους του Κρόνου, οι παραγγελίες γίνονται ηλεκτρονικά.

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

þÿ ³¹µ¹½ º±¹ ±ÃÆ»µ¹± ÃÄ ÇÎÁ

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

ΓΕΩΠΟΝΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΑΘΗΝΩΝ ΤΜΗΜΑ ΕΠΙΣΤΗΜΗΣ ΤΡΟΦΙΜΩΝ ΚΑΙ ΔΙΑΤΡΟΦΗΣ ΤΟΥ ΑΝΘΡΩΠΟΥ

ΠΕΡΙΕΧΟΜΕΝΑ. Κεφάλαιο 1: Κεφάλαιο 2: Κεφάλαιο 3:

1 (forward modeling) 2 (data-driven modeling) e- Quest EnergyPlus DeST 1.1. {X t } ARMA. S.Sp. Pappas [4]

ΤΕΧΝΟΛΟΓΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΥΠΡΟΥ ΣΧΟΛΗ ΕΠΙΣΤΗΜΩΝ ΥΓΕΙΑΣ

AΡΙΣΤΟΤΕΛΕΙΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΘΕΣΣΑΛΟΝΙΚΗΣ ΠΟΛΥΤΕΧΝΙΚΗ ΣΧΟΛΗ ΤΜΗΜΑ ΠΟΛΙΤΙΚΩΝ ΜΗΧΑΝΙΚΩΝ

Optimizing Microwave-assisted Extraction Process for Paprika Red Pigments Using Response Surface Methodology

ES440/ES911: CFD. Chapter 5. Solution of Linear Equation Systems

Η αλληλεπίδραση ανάμεσα στην καθημερινή γλώσσα και την επιστημονική ορολογία: παράδειγμα από το πεδίο της Κοσμολογίας

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review

Ψηφιακό Μουσείο Ελληνικής Προφορικής Ιστορίας: πώς ένας βιωματικός θησαυρός γίνεται ερευνητικό και εκπαιδευτικό εργαλείο στα χέρια μαθητών

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

Mean bond enthalpy Standard enthalpy of formation Bond N H N N N N H O O O

GPU. CUDA GPU GeForce GTX 580 GPU 2.67GHz Intel Core 2 Duo CPU E7300 CUDA. Parallelizing the Number Partitioning Problem for GPUs

Other Test Constructions: Likelihood Ratio & Bayes Tests

Dynamic types, Lambda calculus machines Section and Practice Problems Apr 21 22, 2016

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

Math 6 SL Probability Distributions Practice Test Mark Scheme

Study of urban housing development projects: The general planning of Alexandria City

Section 7.6 Double and Half Angle Formulas

DESIGN OF MACHINERY SOLUTION MANUAL h in h 4 0.

MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)

Local Approximation with Kernels

ΑΠΟΔΟΤΙΚΗ ΑΠΟΤΙΜΗΣΗ ΕΡΩΤΗΣΕΩΝ OLAP Η ΜΕΤΑΠΤΥΧΙΑΚΗ ΕΡΓΑΣΙΑ ΕΞΕΙΔΙΚΕΥΣΗΣ. Υποβάλλεται στην

Fractional Colorings and Zykov Products of graphs

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 6/5/2006

CorV CVAC. CorV TU317. 1

Assalamu `alaikum wr. wb.

Math221: HW# 1 solutions

Τμήμα Ψηφιακών Συστημάτων. Διπλωματική Εργασία

Study of In-vehicle Sound Field Creation by Simultaneous Equation Method

ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ ΕΠΑΝΑΣΧΕΔΙΑΣΜΟΣ ΓΡΑΜΜΗΣ ΣΥΝΑΡΜΟΛΟΓΗΣΗΣ ΜΕ ΧΡΗΣΗ ΕΡΓΑΛΕΙΩΝ ΛΙΤΗΣ ΠΑΡΑΓΩΓΗΣ REDESIGNING AN ASSEMBLY LINE WITH LEAN PRODUCTION TOOLS

ΣΤΥΛΙΑΝΟΥ ΣΟΦΙΑ

Example Sheet 3 Solutions

ΑΝΙΧΝΕΥΣΗ ΓΕΓΟΝΟΤΩΝ ΒΗΜΑΤΙΣΜΟΥ ΜΕ ΧΡΗΣΗ ΕΠΙΤΑΧΥΝΣΙΟΜΕΤΡΩΝ ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ

ΠΑΝΕΠΙΣΤΗΜΙΟ ΠΑΤΡΩΝ ΤΜΗΜΑ ΗΛΕΚΤΡΟΛΟΓΩΝ ΜΗΧΑΝΙΚΩΝ ΚΑΙ ΤΕΧΝΟΛΟΓΙΑΣ ΥΠΟΛΟΓΙΣΤΩΝ ΤΟΜΕΑΣ ΣΥΣΤΗΜΑΤΩΝ ΗΛΕΚΤΡΙΚΗΣ ΕΝΕΡΓΕΙΑΣ

A research on the influence of dummy activity on float in an AOA network and its amendments

Transcript:

Journal of Chinese Computer Systems 2015 10 10 Vol 36 No 10 2015 Max /Min 100872 E-mailwangfengmingqq@ 163 com / SQL Count Sum Max /Min Max /Min TP311 A Max /Min Online Aggregation in the Cloud 1000-1220201510-2177-06 WANG Feng-ming CI Xiang MENG Xiao-feng School of Information Renmin University of China Beijing 100872 China AbstractAs an important part of data analysis data exploration must be able to efficiently access key indicators of data sets such as max / min average and etc These indicators can be obtained by SQL aggregate functions in relational database In order to achieve this goal in massive dataset scholars have proposed the concept of onlineaggregation In the era of big data online aggregation in the cloud has attracted attentions Most of the research focuses on the aggregation function such as Count Sum and other aggregate functions while there is little works on the Max /Min online aggregation now In this paper we use quantile to measure the accuracy of Max /Min online aggregation which induced by chebyshev's inequality and central limit theorem The experimental results demonstrate the efficiency of the method and it can well adapt to online aggregation for big data Key wordsonline aggregationcloud computingchebyshev's inequalitycentral limit theorem 1 Count Sum Average / Max Min 1Max Min Max /Min 2 Max /Min 2014-07-21 2014-09-09 61379050 91224008 2013AA013204 20130004130001 11XNL010 1991 1986 1964

2178 2015 13 / / 3 Max /Min Max /Min Central Limit Theorem 2 Sum Count 1 2 1 / 3 / Ripple join / Max /Min 4 3 1 SELECT opexpt ij col FROM R 5 - WHERE predicate GROUP BY col R op 6 Max Min exp R predicate R col R HOPHa- Max /Min doop Online Prototype 7 Hadoop MapReduce / HOP MapReduce 95% COLA 8 9 HOP 3 2 chebyshev's inequalitymax / 10 Min Max /Min Max /Min 90 MapReduce 5% MapReduce 1% 11 data skew 0 < p < 1 X Z δ px > Z δ = δ 12 12 Count Sum Average Max /Min

10 Max /Min X μ ε > 0 P X - μ ε σ2 1 ε 2 1 P 槡 n X - μ 槡 { n ε μ T 1 /2 n 2 T } 2Φ 槡 n ε μ - 1 12 ( 1 /2 n 2 T ) 1 /2 n 2 Max /Min Z { δ δ + 1/2 槡 n ε μ PX - μ t σ2 = Z t 0 + t 2 T 1 /2 δ 13 n 2 2 PX - μ t σ2 t < 0 ε μ = Z2 T 1 /2 δ n 2 ( ) + t 2 n M M - μ > 0 2 PX - μ M - μ + M - μ 2 PX M + M - μ 2 3 4 M 1 - + M - μ 2 NN - μ < 0 2 PX - μ N - μ + N - μ 2 PX N + N - μ 2 PX N 1 - + N - μ 2 5 6 7 N + N - μ 2 MapReduce 47 μ Map 8 3 3 47 δ μ ε μ ε ε μ P X - μ ε μ = δ 8 Max /Min 8 { } P 槡 n X - μ 槡 n ε μ = { T } δ 11 1 /2 ε = T 1 /2 n 2 n 2 Z2 T δ n 4 - T 2 1 /2 n 2 ( ) X T n 4 = n i-1 i - X 4 n - 1 n 14 δ M φ M + ε φ M = 1 - + ε + M - μ - ε μ 2 N φ N + ε φ N = + ε + N - μ - ε μ 2 15 16 4 Max /Min X - μ P 槡 /n ε μ 槡 /n = δ 9 2 / P 槡 n X - μ 槡 n ε { μ = σ σ } δ 10 / 3 T n 2 = 4 n X i-1 i - X 2 n - 1 6 2179 1 5 /

2180 2015 7 MapReduce Map 1 1 Map InputObject t OutputText key Text value 1 2 3 4 5 if t satisfies the predicate then key sett tuple lang value sett tuple size end if output collectkey value Reduce 2 2 Reduce InputText key Iterator Textvalues OutputMax Min fi_max fi_min 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 / /size_nnumber of tuples processed by the reducer / /sum i sum of the variables in the last iteration / /Maxevaluate max / /Minevaluate min / /fi_maxmax quantile / /fi_minmin quantile while values hasnext do Text it = values getnext val + = sum t_n2 + = list geti- avg^2 t_n4 + = t_n2^2 sega2 = t_n2 /size_n - 1 avg_err = 1 96^2* sega2 /size_n^1 /2 sega4 = t_n4 /size_n - 1 dev_err = 1 96^2* sega4 - sega2^2 /size_n ^1 /2 16 fi_max = 1 - sega2^2 + dev_err^2/sega2^2 + dev_err^2 + Max - avg - avg_err^2 17 fi_min = sega2^2 + dev_err^2/sega2^2 + dev_err^2 + Min - avg - avg_err^2 18 end while output collectkey new Textres 5 552 3574 0 9170 5 1 911 3574 0 9704 11 1Gbit 1229 3574 0 9769 HDFS MapReduce master 10 slave 2 33G CPU 1441 2922 3574 3574 0 9818 0 9804 7GB 1 8TB HDFS 3574 3574 0 9954 64M MapReduce COLA Max /Min 1TB 100G visit_log pageviews 1 Max Min 1 HDFS Max /Min Q1 = SELECT Maxpageviews language FROM visit_log GROUP BY language Q2 = SELECT Minpageviews language FROM visit_log GROUP BY language 0 95 13 Z δ 0 975 Z δ = 1 96 5 2 Wikipedia relative_error relative_error relative_error = estimatevalue - actualvalue actualvalue 17 avgtime_max avgtime_min avgtime_max 0 95 0 99 avgtime_min 0 95 0 01 1 2 Table 1 Quantile of Max Table 2 Quantile of Min online aggregation online aggregation 165 3574 0 7288 325 12 0 7241 552 3574 0 8691 194 12 0 6571 552 3574 0 8870 85 12 0 4593 59 12 0 2069 59 12 0 2128 3574 3574 0 9935 33 12 0 1002 33 12 0 0994 24 12 0 0226 12 12 0 0124 12 12 0 0135 1 2 Wikipedia 13 0 99 0 01 320GB

10 Max /Min 2181 0 99 Count Sum 0 01 1 2 Max /Min 15% 5% 30% 0 5 2 1 1 Q1 2 Q2 1 Q1 2 Q2 3 Q1 Fig 1 Query error of Q1 Fig 2 Query error of Q2 Fig 3 Query time of Q1english Max Min Max 3 5 2 2 4 Q1 11 100G 10 3 Q1 5 Q2 3 Q2 4 Q1 5 Q2 6 Q2 Fig 4 Query time of Q1french Fig 5 Query time of Q2english Fig 6 Query time of Q2french 6 Q2 Q1 Q2 5 6 Q2 2 100G slave 2 4 6 8 10 8 5 2 3 6 1 20G 40G 60G 80G 100G Q1 Q2 Count Sum Max Min / / 7 8 Fig 7 Scalability of data Fig 8 Scalability of cluster 7 Q1

2182 2015 References Top-K ACM Conference on Management of Data New YorkACM 2010 1115-1118 1Joseph M Hellerstein Peter J Hass Helen J Wang Online aggregationc Proceedings of ACM Conference on Management of DataNew YorkACM 1997171-182 2Peter J Haas Large-sample and deterministic confidence intervals for online aggregationc Proceedings of International Conference on Scientific and Statistical DB ManagementPiscatawayNJ IEEE 199751-63 3Peter J Haas Joseph M Hellerstein Ripple joins for online aggregationc Proc of SIGMOD 1999 New YorkACM 1999287-298 4Gang Luo Curt J Ellmann Peter J Haas et al A scalable hash ripple join algorithmc Proceedings of ACM Conference On Management of Data New YorkACM 2005252-262 5Chris Jermaine Alin Dobra Subramanian Arumugam et al A diskbased join with probabilistic guaranteesc Proceedings of ACM Conference on Management of Data New YorkACM 2005563-574 6Wu Sai Jiang Shou-xu Beng Chin Ooi et al Distributed online aggregationj The Proceedings of the VLDB Endowment 2009 2 1 443-454 7Tyson Condie Neil Conway Peter Alvaro et al Online aggregation and continuous query support in MapreduceC Proceedings of 8Shi Ying-jie Meng Xiao-feng Wang Fu-sheng et al You can stop early with COLAonline processing of aggregate queries in the cloudc Proceedings of ACM International Conference on Information and Knowledge Management New YorkACM 20121223-1232 9COLAEB /OL http/ /idke ruc edu cn /COLA / 2014 10Niketan Pansare Vinayak R Borkar Chris Jermaine et al Online aggregation for large mapreducejobsj The Proceedings of the VLDB Endowment 2011 4111135-1145 11Vasiliki Kalavri Vaidas BrundzaVladimir Vlassov Block samplingefficient accurate online aggregation in MapReduceC Proc of Cloud Com'13 Piscataway NJIEEE 2013250-257 12Wang Yu-xiang Luo Jun-zhou Song Ai-bo et al OATSonline aggregation with two-level sharing strategy in cloud J Distributed and Parallel Databases 2014 321 1-39 13Wu Ming-xi Chris JermaineGuessing the extreme values in a data seta Bayesian method and its applications J The VLDB Journal 2009 182 571-597 14Wikipedia page traffic statisticseb /OL http/ /aws amazon com /datasets /2596 2014