C++ 78 (478) A Parallel Skeleton Library in C++ with Optimization

Σχετικά έγγραφα
IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

GPGPU. Grover. On Large Scale Simulation of Grover s Algorithm by Using GPGPU

GPU. CUDA GPU GeForce GTX 580 GPU 2.67GHz Intel Core 2 Duo CPU E7300 CUDA. Parallelizing the Number Partitioning Problem for GPUs

Probabilistic Approach to Robust Optimization

Study of urban housing development projects: The general planning of Alexandria City

A Method for Creating Shortcut Links by Considering Popularity of Contents in Structured P2P Networks

Dynamic types, Lambda calculus machines Section and Practice Problems Apr 21 22, 2016

2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems

Re-Pair n. Re-Pair. Re-Pair. Re-Pair. Re-Pair. (Re-Merge) Re-Merge. Sekine [4, 5, 8] (highly repetitive text) [2] Re-Pair. Blocked-Repair-VF [7]

Orthogonalization Library with a Numerical Computation Policy Interface

Toward a SPARQL Query Execution Mechanism using Dynamic Mapping Adaptation -A Preliminary Report- Takuya Adachi 1 Naoki Fukuta 2.

[4] 1.2 [5] Bayesian Approach min-max min-max [6] UCB(Upper Confidence Bound ) UCT [7] [1] ( ) Amazons[8] Lines of Action(LOA)[4] Winands [4] 1

Simplex Crossover for Real-coded Genetic Algolithms

Optimization, PSO) DE [1, 2, 3, 4] PSO [5, 6, 7, 8, 9, 10, 11] (P)

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

Vol. 31,No JOURNAL OF CHINA UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb

Assalamu `alaikum wr. wb.

Maxima SCORM. Algebraic Manipulations and Visualizing Graphs in SCORM contents by Maxima and Mashup Approach. Jia Yunpeng, 1 Takayuki Nagai, 2, 1

Ερευνητική+Ομάδα+Τεχνολογιών+ Διαδικτύου+

HIV HIV HIV HIV AIDS 3 :.1 /-,**1 +332

Stabilization of stock price prediction by cross entropy optimization

2. 3. OCaml. Scheme[13] do CPS. On optimization for recursive programs without tailcalls.

Quick algorithm f or computing core attribute

Τοποθέτηση τοπωνυµίων και άλλων στοιχείων ονοµατολογίας στους χάρτες

SocialDict. A reading support tool with prediction capability and its extension to readability measurement

Development of the Nursing Program for Rehabilitation of Woman Diagnosed with Breast Cancer

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

Feasible Regions Defined by Stability Constraints Based on the Argument Principle

ΜΑΡΙΝΑ Ε. ΜΠΙΣΑΚΗ. Τκήκα Δθαξκνζκέλωλ Μαζεκαηηθώλ Παλεπηζηήκην Κξήηεο Τ.Θ , Ηξάθιεην, Κξήηε

ΔΙΠΛΩΜΑΤΙΚΕΣ ΕΡΓΑΣΙΕΣ

Durbin-Levinson recursive method

ΑΓΓΛΙΚΑ Ι. Ενότητα 7α: Impact of the Internet on Economic Education. Ζωή Κανταρίδου Τμήμα Εφαρμοσμένης Πληροφορικής

Buried Markov Model Pairwise

ST5224: Advanced Statistical Theory II

GPU DD Double-Double 3 4 BLAS Basic Linear Algebra Subprograms [3] 2

ES440/ES911: CFD. Chapter 5. Solution of Linear Equation Systems

EM Baum-Welch. Step by Step the Baum-Welch Algorithm and its Application 2. HMM Baum-Welch. Baum-Welch. Baum-Welch Baum-Welch.

Παράλληλος προγραμματισμός περιστροφικών αλγορίθμων εξωτερικών σημείων τύπου simplex ΠΛΟΣΚΑΣ ΝΙΚΟΛΑΟΣ

Indexing Methods for Encrypted Vector Databases

ER-Tree (Extended R*-Tree)

GPU GPU GPU GPU. GPU (Graphics Processing Unit) GPU GPU GPU AGPU [11] AGPU. GPGPU (general-purpose GPU) GPU GPU AGPU GPU

Homomorphism in Intuitionistic Fuzzy Automata

ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ ΣΧΟΛΗ ΗΛΕΚΤΡΟΛΟΓΩΝ ΜΗΧΑΝΙΚΩΝ ΚΑΙ ΜΗΧΑΝΙΚΩΝ ΥΠΟΛΟΓΙΣΤΩΝ ΤΟΜΕΑΣ ΗΛΕΚΤΡΙΚΗΣ ΙΣΧΥΟΣ

SCITECH Volume 13, Issue 2 RESEARCH ORGANISATION Published online: March 29, 2018

Schedulability Analysis Algorithm for Timing Constraint Workflow Models

Retrieval of Seismic Data Recorded on Open-reel-type Magnetic Tapes (MT) by Using Existing Devices

Test Data Management in Practice

ΤΕΧΝΟΛΟΓΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΥΠΡΟΥ ΣΧΟΛΗ ΕΠΙΣΤΗΜΩΝ ΥΓΕΙΑΣ

ΠΑΝΔΠΙΣΗΜΙΟ ΜΑΚΔΓΟΝΙΑ ΠΡΟΓΡΑΜΜΑ ΜΔΣΑΠΣΤΥΙΑΚΧΝ ΠΟΤΓΧΝ ΣΜΗΜΑΣΟ ΔΦΑΡΜΟΜΔΝΗ ΠΛΗΡΟΦΟΡΙΚΗ

ΟΙΚΟΝΟΜΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΑΘΗΝΩΝ ΠΑΤΗΣΙΩΝ ΑΘΗΝΑ Ε - ΜΑΙL : mkap@aueb.gr ΤΗΛ: , ΚΑΠΕΤΗΣ ΧΡΥΣΟΣΤΟΜΟΣ. Βιογραφικό Σημείωμα

FX10 SIMD SIMD. [3] Dekker [4] IEEE754. a.lo. (SpMV Sparse matrix and vector product) IEEE754 IEEE754 [5] Double-Double Knuth FMA FMA FX10 FMA SIMD

Maude 6. Maude [1] UIUC J. Meseguer. Maude. Maude SRI SRI. Maude. AC (Associative-Commutative) Maude. Maude Meseguer OBJ LTL SPIN

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

Ανάκτηση Πληροφορίας

Liner Shipping Hub Network Design in a Competitive Environment

Network Algorithms and Complexity Παραλληλοποίηση του αλγορίθμου του Prim. Αικατερίνη Κούκιου

Fourier transform, STFT 5. Continuous wavelet transform, CWT STFT STFT STFT STFT [1] CWT CWT CWT STFT [2 5] CWT STFT STFT CWT CWT. Griffin [8] CWT CWT

Newman Modularity Newman [4], [5] Newman Q Q Q greedy algorithm[6] Newman Newman Q 1 Tabu Search[7] Newman Newman Newman Q Newman 1 2 Newman 3

Molecular evolutionary dynamics of respiratory syncytial virus group A in

CRASH COURSE IN PRECALCULUS

Bundle Adjustment for 3-D Reconstruction: Implementation and Evaluation

The st Asian Legislative Experts Symposium ALES ALES KOICA ALES. The 1st Asian Forum of Legislative Information Affairs ALES

Study of In-vehicle Sound Field Creation by Simultaneous Equation Method

Minimum Spanning Tree: Prim's Algorithm

Simulink The MathWorks, Inc.

Περιβάλλον Παράλληλου Προγραμματισμού

A Bonus-Malus System as a Markov Set-Chain. Małgorzata Niemiec Warsaw School of Economics Institute of Econometrics

Development of a basic motion analysis system using a sensor KINECT

ΒΙΟΓΡΑΦΙΚΟ ΣΗΜΕΙΩΜΑ ΣΤΥΛΙΑΝΗΣ Κ. ΣΟΦΙΑΝΟΠΟΥΛΟΥ Αναπληρώτρια Καθηγήτρια. Τµήµα Τεχνολογίας & Συστηµάτων Παραγωγής.

Partial Trace and Partial Transpose

FPGA. Fast and Efficient Tsunami Propagation Simulation with FPGA and GPGPU

Nov Journal of Zhengzhou University Engineering Science Vol. 36 No FCM. A doi /j. issn

Research on vehicle routing problem with stochastic demand and PSO2DP algorithm with Inver2over operator

1) Abstract (To be organized as: background, aim, workpackages, expected results) (300 words max) Το όριο λέξεων θα είναι ελαστικό.

Πανεπιστήμιο Πειραιώς Τμήμα Πληροφορικής Πρόγραμμα Μεταπτυχιακών Σπουδών «Πληροφορική»

entailment Hoare triple Brotherston Brotherston

Journal of the Graduate School of the Chinese Academy of Sciences. Application Dependent Software. Standard Application Components.

VBA Microsoft Excel. J. Comput. Chem. Jpn., Vol. 5, No. 1, pp (2006)

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

H/Y Ε-07: Κατανεµηµένα Συστήµατα Εαρινό Εξάµηνο Ακ. Έτους ιδάσκουσα: Παναγιώτα Φατούρου Προγραµµατιστικές Εργασίες

Information and Communication Technologies in Education

ΣΥΓΧΡΟΝΕΣ ΤΑΣΕΙΣ ΣΤΗΝ ΕΚΤΙΜΗΣΗ ΚΑΙ ΧΑΡΤΟΓΡΑΦΗΣΗ ΤΩΝ ΚΙΝΔΥΝΩΝ

Appendix to On the stability of a compressible axisymmetric rotating flow in a pipe. By Z. Rusak & J. H. Lee

Fourier Series. MATH 211, Calculus II. J. Robert Buchanan. Spring Department of Mathematics

ΕΘΝΙΚΗ ΥΟΛΗ ΔΗΜΟΙΑ ΔΙΟΙΚΗΗ ΙH ΕΚΠΑΙΔΕΤΣΙΚΗ ΕΙΡΑ ΤΜΗΜΑ ΚΟΙΝΩΝΙΚΗΣ ΔΙΟΙΚΗΣΗΣ ΔΙΟΙΚΗΣΗ ΜΟΝΑΔΩΝ ΥΓΕΙΑΣ ΤΕΛΙΚΗ ΕΡΓΑΣΙΑ

Automatic extraction of bibliography with machine learning

MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)

ΤΕΧΝΙΚΕΣ ΑΥΞΗΣΗΣ ΤΗΣ ΑΠΟΔΟΣΗΣ ΤΩΝ ΥΠΟΛΟΓΙΣΤΩΝ I

Math 6 SL Probability Distributions Practice Test Mark Scheme

Detection and Recognition of Traffic Signal Using Machine Learning

derivation of the Laplacian from rectangular to spherical coordinates

Εκπαίδευση και Πολιτισμός: έρευνα προκαταρκτικής αξιολόγησης μίας εικονικής έκθεσης

Efficient Implementation of Sparse Linear Algebra Operations on InfiniBand Cluster. Akira Nishida,

ΓΕΩΜΕΣΡΙΚΗ ΣΕΚΜΗΡΙΩΗ ΣΟΤ ΙΕΡΟΤ ΝΑΟΤ ΣΟΤ ΣΙΜΙΟΤ ΣΑΤΡΟΤ ΣΟ ΠΕΛΕΝΔΡΙ ΣΗ ΚΤΠΡΟΤ ΜΕ ΕΦΑΡΜΟΓΗ ΑΤΣΟΜΑΣΟΠΟΙΗΜΕΝΟΤ ΤΣΗΜΑΣΟ ΨΗΦΙΑΚΗ ΦΩΣΟΓΡΑΜΜΕΣΡΙΑ

Γιπλυμαηική Δπγαζία. «Ανθπυποκενηπικόρ ζσεδιαζμόρ γέθςπαρ πλοίος» Φοςζιάνηρ Αθανάζιορ. Δπιβλέπυν Καθηγηηήρ: Νηθφιανο Π. Βεληίθνο

DESIGN OF MACHINERY SOLUTION MANUAL h in h 4 0.

Κβαντική Επεξεργασία Πληροφορίας

ΓΡΑΜΜΙΚΟΣ & ΔΙΚΤΥΑΚΟΣ ΠΡΟΓΡΑΜΜΑΤΙΣΜΟΣ

Security in the Cloud Era

Transcript:

78 (478) C++ BMF C++ Skeletal parallel programming enables programmers to build a parallel program from ready-made components called skeletons (parallel primitives) for which efficient implementations are known to exist, making both the parallel program development and the parallelization process easier. Parallel programs in terms of skeletons are, however, not always efficient, because intermediate data structures which do not appear in the final result may be produced and passed between skeletons. To overcome this problem and make the skeletal parallel programming more practical, this paper proposes a new parallel skeleton library in C++. This system have an optimization mechanism which transforms successive calls of parallel skeletons into a single function call with the help of fusion transformation. This paper describes the implementation of the skeleton library and reports the effects of the optimization. 1 A Parallel Skeleton Library in C++ with Optimization Mechanism. Yoshiki Akashi,, Graduate School of Electro-Communications, The University of Electro-Communications. Kiminori Matsuzaki, Kazuhiko Kakehi,, Graduate School of Information Science and Technology, The University of Tokyo. Hideya Iwasaki,, Department of Computer Science, The University of Electro- Communications. Zhenjiang Hu,, Graduate School of Information Science and Technology, The University of Tokyo. 21, PRESTO 21, Japan Science and Technology Agency., Vol.22, No.4(2005), pp.78 83. [ ] 2005 2 18. [6]

(479) Vol. 16 No. 5 Sep. 1999 79 BMF[3] C++ C++ 2 BMF BMF 2. 1 (f g) x = f (g x) a b = (a ) b = ( b) a = ( ) a b [ ] a [a] [ ] a [a] x ++ y x y [1] ++ [2] ++ [3] [1, 2, 3] [a] ++ x a : x 2. 2 BMF map reduce scan zip 4 map f map f [x 1, x 2,..., x n ] = [f x 1, f x 2,..., f x n ] reduce reduce ( ) [x 1, x 2,..., x n] = x 1 x 2 x n scan reduce e scan ( ) [x 1, x 2,..., x n] = [e, e x 1,, e x 1 x n ] zip 2 1 zip [x 1, x 2,..., x n ] [y 1, y 2,..., y n ] = [(x 1, y 1 ), (x 2, y 2 ),..., (x n, y n )] 4 Hu [7][9] accumulate

80 (480) g p q accumulate [ ] e = g e accumulate (a : x) e = p (a, e) accumulate x (e q a) accumulate [g, (p, ), (q, )] 3 C++ MPICH 3. 1 dist_array array dist_array<int> *as = new dist_array<int>(array, size); 1 array 3. 2 2.2 dist_array template<typename B> dist_array<b>* map(b (*f)(const A&)) const; template<typename B> void map(void (*f)(b*, const A*), dist_array<b> *bs) const; void map_ow(a (*f)(const A&)); 1 map map 1. 2. 3. A map 1 1 1 2 3 map_ow map_ow map_ow as f as->map_ow(f); n p O(1) map n/p map O(n/p) reduce

(481) Vol. 16 No. 5 Sep. 1999 81 1 O(n/p) O(log p) O(log p) reduce O(n/p + log p) scan O(n/p) O(log p) O(n/p) scan O(n/p + log p) zip 2 C++ pair zip map O(n/p) 3. 3 as = [a 1, a 2,..., a n ] var var = ave = nx (a i ave) 2 /n i=1 nx a i /n i=1 BMF 2 (a) 2 (b) BMF n a a 1 n n dist_array var as = sqsum/n where sum = reduce (+) as ave = sum/n sqsum = reduce (+) (map square (map ( ave) as)) (a) BMF sum = as->reduce(add); ave = sum / n; as->map_ow(sub_ave); as->map_ow(square); sq_sum = as->reduce(add); var = sq_sum / n; (b) 2... for(int i = 0; i < number; i++){ ave_a[i] = a[i].reduce(add) / size; ave += ave_a[i]; }... for(int i = 0; i < number; i++){ a[i].map_ow(sub_ave); a[i].map_ow(square); } for(int i = 0; i < number; i++) st += a[i].reduce(add);... 3 3 number size 4 map f (map g x) map 2 map (f g) x map 1

82 (482) 4. 1 Hu [8] accumulate cataj buildj cataj buildj (cataj). cataj accumulate p e cataj [ ] = e cataj (a : x) = p a cataj x cataj ([, p, e]) (buildj). buildj buildj gen = gen ( + ) [ ] [ ] cataj append [ ] e [ ] p : p reduce cataj p buildj 3 buildj buildj cataj buildj e p CataJ-BuildJ accumulate cataj buildj id map f = buildj (λc s e. ([c, s f, e])) reduce ( ) = ([, id, e]) scan ( ) x = buildj (λc s e. [[s, (λ(a, e). s e, c), (id, )]]) x e CataJ-BuildJ : ([c, s, e]) buildj gen = gen c s e map reduce cataj reduce ( ) map f = ([, id, e]) buildj (λc s e. ([c, s f, e])) = ((λc s e. ([c, s f, e])) ( ) id e) = ([, f, e]) map f map g BuildJ(CataJ-BuildJ) : buildj (λc s e. ([φ 1, φ 2, φ 3 ])) buildj gen = buildj (λc s e. gen φ 1 φ 2 φ 3) map f map g map f map g = buildj (λc s e. ([c, s f g, e])) fst BuildJ(Acc-BuildJ) : buildj (λc s e. [[g, (p, ), (q, )]]) (buildj gen x) e = fst (buildj (λc s e. gen ( ) f d) x e) where (u v) e = let (r 1, s 1, t 1 ) = u e (r 2, s 2, t 2) = v (e t 1)

(483) Vol. 16 No. 5 Sep. 1999 83 in (s 1 r 2, s 1 s 2, t 1 t 2 ) f a e = (p (a, e) g (e q a), p (a, e), q a)) d e = (g e,, ) 4. 2 OpenC++ [5] cataj buildj OpenC++ 2 map f map g map (f g) 3.3 map_ow reduce [[ 1 as -> sum cataj [[add]] nil nil] ;] [[ave = [sum / size]] ;] [[ 3 as -> as buildj cataj nil [[sub_ave]] nil] ;] [[ 3 as -> as buildj cataj nil [[square]] nil] ;] [[ 1 as -> sq_sum cataj [[add]] nil nil] ;] [[var = [sq_sum / size]] ;] BuildJ(CataJ-BuildJ) CataJ-BuildJ [[ 1 as -> sum cataj [[add]] nil nil] ;] [[ave = [sum / size]] ;] [[ 1 as -> sq_sum cataj [[add]] [[sub_ave] [square]] nil] ; ] [[var = [sq_sum / size]] ;] CPU 1 Pentium4 2.4GHz 512MB 1Gbps OS Linux 2.4.20 g++2.96 MPICH mpich 1.2.6 sum = as->reduce(add); ave = sum / size; sq_sum = as->cataj(_sym11086_2, add); var = sq_sum / size; _sym11086_2 sub_ave square 2 map reduce 1 cataj 2(n/p) 5 3.3 C++ MPI 3 1000 10 100 100 1 10 PC 4 29.7% BuildJ(CataJ-BuildJ) 16.0% CataJ-BuildJ 13.7% 10 8.20 5 7.8%

84 (484) Execution Time (sec) 60 50 40 30 20 10 skeleton optimized skeleton C++ + MPI Execution Time (sec) 120 100 80 60 40 20 skeleton optimized skeleton C++ + MPI 0 1 2 3 4 5 6 7 8 9 10 Number of Processors 0 1 2 3 4 5 6 7 8 9 10 Number of Processors 4 5 C++ MPI 15% 1 6 P3L [2] map reduce scan pipe P3L C C Skil [4] C Skil C HPC++ [10] map reduce scan 1 5% 7% HPC++ P3L Skil C HPC++ C++ C++ 7 BMF C++ map zip reduce scan 2 5 [12] Tree Contraction [1] [11] zip

(485) Vol. 16 No. 5 Sep. 1999 85 [ 1 ] Abrahamson, K., Dadoun, N., Kirkpatrik, D., and Przytycka, T.: A Simple Parallel Tree Contraction Algorithm, Journal of Algorithms, Vol. 10, No. 2 (1989), pp. 287 302. [ 2 ] Bacci, B., Danelutto, M., Orlando, S., Pelagatti, S., and Vanneschi, M.: P3L: A Structured High Level Programming Language and its Structured Support, Concurrency: Practice and Experience, Vol. 7, No. 3 (1995), pp. 225 255. [ 3 ] Bird, R.: An Introduction to the Theory of Lists, Proc. NATO Advanced Study Institute on Logic of Programming and Calculi of Discrete Design, Springer-Verlag, 1987, pp. 5 42. [ 4 ] Botorog, G. and Kuchen, H.: Skil: An Imperative Language with Algorithmic Skeletons for Efficient Distributed Programming, Proc. 5th International Symposium on High Performance Distributed Computing (HPDC-5), IEEE Computer Society Press, 1996, pp. 243 252. [ 5 ] Chiba, S.: OpenC++. http://opencxx.sourceforge.net/. [ 6 ] Cole, M.: Algorithmic Skeletons: A Structured Approach to the Management of Parallel Computation, Research Monographs in Parallel and Distribute Computing, Pitman, 1989. [ 7 ] Hu, Z., Iwasaki, H., and Takeichi, M.: Diffusion: Calculating Efficient Parallel Programs, Proc. 1999 ACM SIGPLAN International Workshop on Partial Evaluation and Semantics-Based Program Manipulation (PEPM 99), 1999, pp. 85 94. [ 8 ] Hu, Z., Iwasaki, H., and Takeichi, M.: An Accumulative Parallel Skeleton for All, Proc. 2002 European Symposium on Programming (ESOP 2002), Lecture Notes in Computer Science 2305, Springer- Verlag, 2002, pp. 83 97. [ 9 ] Iwasaki, H. and Hu, Z.: A New Parallel Skeleton for General Accumulative Computations, International Journal of Parallel Programming, Vol. 32, No. 5 (2004), pp. 389 414. [10] Johnson, E. and Gannon, D.: HPC++: Experiments with the Parallel Standard Template Library, Proc. 11th International Conference on Supercomputing, ACM Press, 1997, pp. 124 131. [11] Miller, G. and Reif, J.: Parallel Tree Contraction and its Application, Proc. 26th Annual Symposium on Foundations of Computer Science, IEEE Computer Society Press, 1985, pp. 478 489. [12] Skillicorn, D.: Parallel Implementation of Tree Skeletons, Journal of Parallel and Distributed Computing, Vol. 39, No. 2 (1996), pp. 115 125.