FPGA. Fast and Efficient Tsunami Propagation Simulation with FPGA and GPGPU

Σχετικά έγγραφα
GPGPU. Grover. On Large Scale Simulation of Grover s Algorithm by Using GPGPU

GPU. CUDA GPU GeForce GTX 580 GPU 2.67GHz Intel Core 2 Duo CPU E7300 CUDA. Parallelizing the Number Partitioning Problem for GPUs

GPU. Energy Consumption and Acceleration of GPU of Molecular Dynamics Simulation. TAKURO UDAGAWA 1 and MASAKAZU SEKIJIMA 2, 3

GPU DD Double-Double 3 4 BLAS Basic Linear Algebra Subprograms [3] 2

Αρχιτεκτονική Σχεδίαση Ασαφούς Ελεγκτή σε VHDL και Υλοποίηση σε FPGA ΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ

FX10 SIMD SIMD. [3] Dekker [4] IEEE754. a.lo. (SpMV Sparse matrix and vector product) IEEE754 IEEE754 [5] Double-Double Knuth FMA FMA FX10 FMA SIMD

Automatic generation of Network-on-Chip topology under link length and latency constraint

CUDA FFT. High Performance 3-D FFT in CUDA Environment. Akira Nukada, 1, 2 Yasuhiko Ogata, 1, 2 Toshio Endo 1, 2 and Satoshi Matsuoka 1, 2, 3

Re-Pair n. Re-Pair. Re-Pair. Re-Pair. Re-Pair. (Re-Merge) Re-Merge. Sekine [4, 5, 8] (highly repetitive text) [2] Re-Pair. Blocked-Repair-VF [7]

Binary32 (a hi ) 8 bits 23 bits Binary32 (a lo ) 8 bits 23 bits Double-Float (a=a hi +a lo, a lo 0.5ulp(a hi ) ) 8 bits 46 bits Binary64 11 bits sign

Vol. 31,No JOURNAL OF CHINA UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb

2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

ΥΠΟΛΟΓΙΣΤΙΚΗ ΧΗΜΕΙΑ ΜΕ ΕΦΑΡΜΟΓΕΣ ΣΕ ΜΟΡΙΑ, ΥΛΙΚΑ, ΠΕΡΙΒΑΛΛΟΝ

ER-Tree (Extended R*-Tree)

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

Retrieval of Seismic Data Recorded on Open-reel-type Magnetic Tapes (MT) by Using Existing Devices

Efficient Implementation of Sparse Linear Algebra Operations on InfiniBand Cluster. Akira Nishida,

High order interpolation function for surface contact problem

ΜΑΡΙΝΑ Ε. ΜΠΙΣΑΚΗ. Τκήκα Δθαξκνζκέλωλ Μαζεκαηηθώλ Παλεπηζηήκην Κξήηεο Τ.Θ , Ηξάθιεην, Κξήηε

ΣΥΣΤΗΜΑΤΑ ΥΠΟΛΟΓΙΣΤΩΝ.

ΥΠΟΛΟΓΙΣΤΙΚΗ ΧΗΜΕΙΑ ΜΕ ΕΦΑΡΜΟΓΕΣ ΣΕ ΜΟΡΙΑ, ΥΛΙΚΑ, ΠΕΡΙΒΑΛΛΟΝ. Ι ΑΣΚΟΝΤΕΣ: Μαρία Κανακίδου, Σταύρος Φαράντος, Γιώργος Φρουδάκης

Buried Markov Model Pairwise

Fourier transform, STFT 5. Continuous wavelet transform, CWT STFT STFT STFT STFT [1] CWT CWT CWT STFT [2 5] CWT STFT STFT CWT CWT. Griffin [8] CWT CWT

Feasible Regions Defined by Stability Constraints Based on the Argument Principle

Maxima SCORM. Algebraic Manipulations and Visualizing Graphs in SCORM contents by Maxima and Mashup Approach. Jia Yunpeng, 1 Takayuki Nagai, 2, 1

ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ. του φοιτητή του Τμήματος Ηλεκτρολόγων Μηχανικών και. Τεχνολογίας Υπολογιστών της Πολυτεχνικής Σχολής του. Πανεπιστημίου Πατρών

Ερευνητική+Ομάδα+Τεχνολογιών+ Διαδικτύου+

Schedulability Analysis Algorithm for Timing Constraint Workflow Models

ΑΝΙΧΝΕΥΣΗ ΓΕΓΟΝΟΤΩΝ ΒΗΜΑΤΙΣΜΟΥ ΜΕ ΧΡΗΣΗ ΕΠΙΤΑΧΥΝΣΙΟΜΕΤΡΩΝ ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ

2 ~ 8 Hz Hz. Blondet 1 Trombetti 2-4 Symans 5. = - M p. M p. s 2 x p. s 2 x t x t. + C p. sx p. + K p. x p. C p. s 2. x tp x t.

3.8.1 J (7) (1883~1906) (1907~1931) A ~ (10) i J C-1 ~1973 C-2

Λογικά σύμβολα των CPU, RAM, ROM και I/O module

Bundle Adjustment for 3-D Reconstruction: Implementation and Evaluation

Δθμιουργία, μελζτθ και βελτιςτοποίθςθ φωτορεαλιςτικϊν απεικονίςεων πραγματικοφ χρόνου με χριςθ προγραμματιηόμενων επεξεργαςτϊν γραφικϊν

Ηρϊκλειτοσ ΙΙ. Πανεπιζηήμιο Θεζζαλίας. Τμήμα Μηχανικών Η/Υ και Δικτύων

«Σχεδίαση Εφαρμογών Ψηφιακδη Συστημάτοη με τη Γλώσσα \ HDL»

Εργαστήριο Ψηφιακών Κυκλωμάτων

MOTROL. COMMISSION OF MOTORIZATION AND ENERGETICS IN AGRICULTURE 2014, Vol. 16, No. 5,

Scrub Nurse Robot: SNR. C++ SNR Uppaal TA SNR SNR. Vain SNR. Uppaal TA. TA state Uppaal TA location. Uppaal

Nov Journal of Zhengzhou University Engineering Science Vol. 36 No FCM. A doi /j. issn

A Method for Creating Shortcut Links by Considering Popularity of Contents in Structured P2P Networks

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

Κέντρο Υποδομών και Υπηρεσιών ΤΠΕ Πανεπιστήμιο Κρήτης

Orthogonalization Library with a Numerical Computation Policy Interface

Study of In-vehicle Sound Field Creation by Simultaneous Equation Method

FPGA. Variations and BTI-induced Aging Degradation on Commercial FPGAs. Shouhei ISHII and Kazutoshi KOBAYASHI, 3 FPGA JST, CREST

The User Defined Functions of the Sonnet Lite free Electromagnetic Simulator. N.Ishitobi. Sonnet Giken Co. Ltd. Sonnet Lite S Sonnet Lite.

Optimizing Microwave-assisted Extraction Process for Paprika Red Pigments Using Response Surface Methodology

Ψηφιακή Σχεδίαση. M. Morris Mano. Michael D. Ciletti ΤΕΤΑΡΤΗ ΕΚ ΟΣΗ

ΓΙΑΝΝΟΥΛΑ Σ. ΦΛΩΡΟΥ Ι ΑΚΤΟΡΑΣ ΤΟΥ ΤΜΗΜΑΤΟΣ ΕΦΑΡΜΟΣΜΕΝΗΣ ΠΛΗΡΟΦΟΡΙΚΗΣ ΤΟΥ ΠΑΝΕΠΙΣΤΗΜΙΟΥ ΜΑΚΕ ΟΝΙΑΣ ΒΙΟΓΡΑΦΙΚΟ ΣΗΜΕΙΩΜΑ

GridFTP-APT: Automatic Parallelism Tuning Mechanism for Data Transfer Protocol GridFTP

Yoshifumi Moriyama 1,a) Ichiro Iimura 2,b) Tomotsugu Ohno 1,c) Shigeru Nakayama 3,d)

Αυτόματη Ανακατασκευή Θραυσμένων Αντικειμένων

ΚΑΤΑΣΚΕΥΑΣΤΙΚΟΣ ΤΟΜΕΑΣ

Optimization, PSO) DE [1, 2, 3, 4] PSO [5, 6, 7, 8, 9, 10, 11] (P)

[1] P Q. Fig. 3.1

Ένα µοντέλο Ισοδύναµης Χωρητικότητας για IEEE Ασύρµατα Δίκτυα. Εµµανουήλ Καφετζάκης

An Advanced Manipulation for Space Redundant Macro-Micro Manipulator System

Arbitrage Analysis of Futures Market with Frictions

GPU GPU GPU GPU. GPU (Graphics Processing Unit) GPU GPU GPU AGPU [11] AGPU. GPGPU (general-purpose GPU) GPU GPU AGPU GPU


Σύστημα ψηφιακής επεξεργασίας ακουστικών σημάτων με χρήση προγραμματιζόμενων διατάξεων πυλών. Πτυχιακή Εργασία. Φοιτητής: ΤΣΟΥΛΑΣ ΧΡΗΣΤΟΣ

ΜΕΘΟΔΟΙ ΑΕΡΟΔΥΝΑΜΙΚΗΣ

Why We All Need an AIDS Vaccine? : Overcome the Challenges of Developing an AIDS Vaccine in Japan

From Secure e-computing to Trusted u-computing. Dimitris Gritzalis

ΗΥ220 Εργαστήριο Ψηφιακών Κυκλωµάτων

Simplex Crossover for Real-coded Genetic Algolithms

Automatic extraction of bibliography with machine learning

Newman Modularity Newman [4], [5] Newman Q Q Q greedy algorithm[6] Newman Newman Q 1 Tabu Search[7] Newman Newman Newman Q Newman 1 2 Newman 3

Αριστοτέλειο Πανεπιστήμιο Θεσσαλονίκης Τμήμα Μαθηματικών Π.Μ.Σ. Θεωρητικής Πληροφορικής και Θεωρίας Συστημάτων και Ελέγχου

Ventilated Distribution Transformers

ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ. «Προστασία ηλεκτροδίων γείωσης από τη διάβρωση»

Stabilization of stock price prediction by cross entropy optimization

ΠΕΡΙΓΡΑΦΗ ΚΥΜΑΤΙΚΟΥ ΚΛΙΜΑΤΟΣ ΣΤΟ ΘΡΑΚΙΚΟ ΠΕΛΑΓΟΣ ΜΕ ΤΗ ΧΡΗΣΗ ΤΟΥ ΜΑΘΗΜΑΤΙΚΟΥ ΟΜΟΙΩΜΑΤΟΣ SWAN

Quick algorithm f or computing core attribute

Appendix to On the stability of a compressible axisymmetric rotating flow in a pipe. By Z. Rusak & J. H. Lee

A Sequential Experimental Design based on Bayesian Statistics for Online Automatic Tuning. Reiji SUDA,

ΗΥ220 Εργαστήριο Ψηφιακών Κυκλωμάτων

ΠΑΡΑΔΟΤΕΟ 3.1 : Έκθεση καταγραφής χρήσεων γης

Πτυχιακή Εργασία. Σχεδίαση Εφαρμογών Ψηφιακών Συστημάτων Με Τη Γλώσσα VHDL

Πανεπιστήμιο Πειραιώς Τμήμα Πληροφορικής Πρόγραμμα Μεταπτυχιακών Σπουδών «Πληροφορική»

Legal use of personal data to fight telecom fraud

Indexing Methods for Encrypted Vector Databases

MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)

Συνοπτική Ιστορική εξέλιξη των ΣΓΠ

ΗΥ220 Εργαστήριο Ψηφιακών Κυκλωμάτων

ΕΛΕΓΧΟΣ ΤΩΝ ΠΑΡΑΜΟΡΦΩΣΕΩΝ ΧΑΛΥΒ ΙΝΩΝ ΦΟΡΕΩΝ ΜΕΓΑΛΟΥ ΑΝΟΙΓΜΑΤΟΣ ΤΥΠΟΥ MBSN ΜΕ ΤΗ ΧΡΗΣΗ ΚΑΛΩ ΙΩΝ: ΠΡΟΤΑΣΗ ΕΦΑΡΜΟΓΗΣ ΣΕ ΑΝΟΙΚΤΟ ΣΤΕΓΑΣΤΡΟ

CSJ. Speaker clustering based on non-negative matrix factorization using i-vector-based speaker similarity

Ερευνητική Ομάδα Διαχείρισης Βιοϊατρικών Δεδομένων και Τηλεϊατρικής

Μειέηε, θαηαζθεπή θαη πξνζνκνίσζε ηεο ιεηηνπξγίαο κηθξήο αλεκνγελλήηξηαο αμνληθήο ξνήο ΓΗΠΛΩΜΑΣΗΚΖ ΔΡΓΑΗΑ

Adaptive grouping difference variation wolf pack algorithm

Homomorphism in Intuitionistic Fuzzy Automata

Evolution of Novel Studies on Thermofluid Dynamics with Combustion

1 (forward modeling) 2 (data-driven modeling) e- Quest EnergyPlus DeST 1.1. {X t } ARMA. S.Sp. Pappas [4]

ΔΙΠΛΩΜΑΤΙΚΕΣ ΕΡΓΑΣΙΕΣ

2002 Journal of Software

C++ 78 (478) A Parallel Skeleton Library in C++ with Optimization

Ετήσια Τεχνική Έκθεση

Probabilistic Approach to Robust Optimization

Transcript:

FPGA GPGPU 1 1 2, 3, 1 2, 3 FPGA(Field Programmable Gate Array) GPGPU(General Purpose computing on Graphics Processing Unit) FPGA GPU FPGA GPU CPU Fast and Efficient Tsunami Propagation Simulation with FPGA and GPGPU Hideo Tanida, 1 Akira Fukui, 1 Hiroaki Yoshida 2, 3, 1 and Masahiro Fujita 2, 3 Custom accelerators implemented on FPGA and GPUs are both considered to be solutions to achieve high performance and efficiency at relatively low cost. This paper discusses accelerations of tsunami-propagation simulation based on finite difference method, making use of FPGA and GPU. Experimental results show optimizations with memory hierarchy taken into consideration are effective for implementations on both FPGA and GPU. Both of executions assisted by FPGA and GPU show higher energy efficiency compared to the execution only on general-purpose processor. 1. FPGA(Field Programmable Gate Array) GPGPU(General Purpose computing on Graphics Processing Unit) FPGA FPGA FPGA C/C++ RTL RTL GPGPU(General Purpose Computing on Graphics Processing Unit) GPU FPGA 2) 2011 11 10 GPU 1 Dept. of Electrical Engineering and Information Systems, The University of Tokyo 2 VLSI Design and Education Center, The University of Tokyo 3 CREST CREST, Japan Science and Technology Agency 1 Presently with Fujitsu Laboratories of America, Inc. 1 c 2012 Information Processing Society of Japan

GPU GPU FPGA GPU 2 3 4 FPGA GPU 5 6 2. TUNAMI-N1 TUNAMI-N1 (Tohoku University s Numerical Analysis Model for Investigation of Nearfield tsunamis, No.1) 1) 2.1 (Long Wave Theory) (2) (2)(3) η t + M x + N =0 (1) y M M η + gd =0 (2) t t x N η + gd =0 (3) t y 2.2 TUNAMI-N1 TUNAMI-N1 TUNAMI-N1 TUNAMI-N1 2 H[IF][JF] IF JF 3 Z( ),M(),N() Z[IF][JF],M[IF][JF],N[IF][JF] t t+1 (T ) TUNAMI-N1 TUNAMI-N1 CPU 0 1 2 c 2012 Information Processing Society of Japan

2.3 FPGA GPU t i j Z t [j][i] =Z t [j][i] R (M t [j][i] M t [j][i 1] + N t [j][i] N t [j 1][i]) (4) j =0, 0 <i j = JF 1, 0 <i i =0, 0 <j i = IF 1, 0 <j Z t+1[j][i 1] 1 =(Z t [j][i] ( N t [j][i]+(m t [j][i] M t [j][i 1])/500))/2 (5) G H[j][i] Z t+1 [j][i] 1 =(Z t [j][i] ( M t [j][i]+(n t [j][i] N t [j 1][i])/500))/2 (6) G H[j][i] Z t+1 [j][i] 1 =(Z t [j][i] ( N t [j][i]+(m t [j][i] M t [j][i 1])/500))/2 (7) G H[j][i] Z t+1 [j][i] 1 =(Z t[j][i] ( M t[j][i]+(n t[j][i] N t[j 1][i])/500))/2 (8) G H[j][i] M t+1 [j][i] = M[j][i] G R (H[j][i]+H[j][i +1]) (Z t+1[j][i +1] Z t+1[j][i])/2 (9) N t+1 [j][i] = N[j][i] G R (H[j +1][i]+H[j][i]) (Z t+1 [j +1][i] Z t+1 [j][i])/2 (10) Z t+1 TUNAMI Z t Z t+1 M t,n t 1 2.4 TUNAMI-N1 1 TUNAMI-N1 TUNAMI 3. FPGA TUNAMI-N1 FPGA FPGA Virtex6 SX475T(FPGA) FPGA 24GB SDRAM( ) FPGA PCI Express CPU (Intel Xeon X5650 @2.67GHz ) Maxeler Technologies MaxCompiler MaxCompiler Java VHDL VHDL FPGA Xilinx FPGA C CPU () FPGA ( ) 2 3 c 2012 Information Processing Society of Japan

2 FPGA H() Z( ), M(),N() FPGA FPGA 3.1 2 1 Z, M, N, H CPU MaxCompiler FPGA Z Z, M, N H 2000 1040*668*4*4Byte FPGA BRAM(SRAM) FPGA FPGA (SDRAM) 24GByte FPGA BRAM BRAM 3.2 FPGA %FPGA FPGA 3 3 3 3 1 3 4 1 t 2 t+1 3 t+2 1 3 2 1 3 2 4 c 2012 Information Processing Society of Japan

SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP 3 4 1 FPGA BRAM 4. GPU TUNAMI-N1 GPU 4) GPU NVIDIA Tesla C2075 CPU Intel Xeon X5650 @2.67GHz 5 Tesla C2075 14 Streaming Multiprocessor(SM) SM 32 Streaming Processor(SP) SP L1 L2 5 GPU(Tesla C2075) SP 1 1SP 1024byte L1 4 SM SM L2 6GByte 4.1 TUNAMI-N1 GPU GPU 4) GPU CPU () GPU ( ) 6 Z( 5 c 2012 Information Processing Society of Japan

6 GPGPU ),M(), N() GPU GPU 2 1040*668 7 16 16 SM 65 42 7 SM GPU 2 SM SM TUNAMI-N1 Z 4.2 7200 GPU 1 1040 668 17 7200 = 85, 033, 728, 000 GPU 1,050[GFLOPS] 85, 033, 728, 000/1, 050, 000, 000, 000 = 6 c 2012 Information Processing Society of Japan

0.081[sec.] 2.8[sec.] 34.6 CGMA(Compute to Global Memory Access: ) CGMA 1 4 5 ( ) 1 2 4 10 2 1 2 11 17 CGMA 17/11 = 1.55 GPU 144[GHz] (144/4) 1.55 = 55.8[GFLOPS] 7200 85, 033, 728, 000/55, 800, 000, 000 = 1.52[sec.] 2.8[sec.] 4 17 3 SM 16 16 18 18 8 SM 8 SM 9 7200 syncthreads() 9 5. FPGA GPU (CPU ) FPGA (FPGA ) GPU ( GPU ) CPU Fortran TUNAMI-N1 C FPGA 3.2 GPU 4.2 5.1 1 CPU FPGA GPU 7200 ( 2 ) 7200 86400 ( 24 ) FPGA 7200 7 c 2012 Information Processing Society of Japan

1 (sec.) 7200 7200 86400 () () () CPU 78.7(x1) 80.1(x1) 943(x1) FPGA 1.85(x42.5) 6.22(x12.9) 26.57(x35.5) GPU 2.05(x38.4) 4.71(x17.0) 31.1(x30.32) 2 (W) (J) CPU 24 1888.8 FPGA 42 77.7 GPU 129 264.45 GPU GPU 4.1 34% 5.2 CPU FPGA GPU 86400 2 FPGA GPU CPU CPU FPGA GPU FPGA GPU 1/3 FPGA GPU 3) 1) Fumihiko Imamura, Ahmet Cevdet Yalciner, and Gulizar Ozyurt, TSUNAMI MODELLING MANUAL, available from <http://www.tsunami.civil.tohoku.ac.jp/hokusai3/j/projects/manual-ver-3.1.pdf>, accessed 2012-02-13. 2) Fumihiko Ino, Jun Gomita, Yasuhiro Kawasaki, and Kenichi Hagihara, A GPGPU approach for accelerating 2-D/3-D rigid registration of medical images, in Proc. Parallel and Distributed Processing and Applications (ISPA), vol. 4330, pp. 939-950, 2006. 3) Dong-U. Lee, Altaf Adbul, Ray C. C. Cheung, Oskar Mencer, Wayne Luk, George A., and Constantinides, Accuracy-guaranteed bit-width optimization, IEEE Transactions Computer-Aided Design of Integrated Circuits and Systems, vol.25, no. 10, pp. 1990-2000, Oct. 2006. 4) Harsh Gidra, Israrul Haque, Nitin P. Kumar, Sargurunathan M., M. S. Gaur, Vijay, Laxmi, M.Zwolinski, and Virendra Singh, Parallelizing TUNAMI-N1 using GPGPU, in Proc. IEEE International Conference on High Performance Computing and Communications (HPCC), pp. 845-850, Sep. 2011. 6. FPGA GPU FPGA FPGA BRAM GPU 8 c 2012 Information Processing Society of Japan