Orthogonalization Library with a Numerical Computation Policy Interface

Σχετικά έγγραφα
GPGPU. Grover. On Large Scale Simulation of Grover s Algorithm by Using GPGPU

Numerical Analysis FMN011

ER-Tree (Extended R*-Tree)

BiCG CGS BiCGStab BiCG CGS 5),6) BiCGStab M Minimum esidual part CGS BiCGStab BiCGStab 2 PBiCG PCGS α β 3 BiCGStab PBiCGStab PBiCG 4 PBiCGStab 5 2. Bi

GPU. CUDA GPU GeForce GTX 580 GPU 2.67GHz Intel Core 2 Duo CPU E7300 CUDA. Parallelizing the Number Partitioning Problem for GPUs

Durbin-Levinson recursive method

Supplementary Materials for Evolutionary Multiobjective Optimization Based Multimodal Optimization: Fitness Landscape Approximation and Peak Detection

Quick algorithm f or computing core attribute

Congruence Classes of Invertible Matrices of Order 3 over F 2

FX10 SIMD SIMD. [3] Dekker [4] IEEE754. a.lo. (SpMV Sparse matrix and vector product) IEEE754 IEEE754 [5] Double-Double Knuth FMA FMA FX10 FMA SIMD

Instruction Execution Times

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

Re-Pair n. Re-Pair. Re-Pair. Re-Pair. Re-Pair. (Re-Merge) Re-Merge. Sekine [4, 5, 8] (highly repetitive text) [2] Re-Pair. Blocked-Repair-VF [7]

Toward a SPARQL Query Execution Mechanism using Dynamic Mapping Adaptation -A Preliminary Report- Takuya Adachi 1 Naoki Fukuta 2.

ΓΡΑΜΜΙΚΟΣ & ΔΙΚΤΥΑΚΟΣ ΠΡΟΓΡΑΜΜΑΤΙΣΜΟΣ

ΠΡΟΤΕΙΝΟΜΕΝΑ ΘΕΜΑΤΑ ΔΙΠΛΩΜΑΤΙΚΩΝ ΕΡΓΑΣΙΩΝ ΓΙΑ ΤΟ ΕΑΡΙΝΟ ΕΞΑΜΗΝΟ Εισηγητής: Νίκος Πλόσκας Επίκουρος Καθηγητής ΤΜΠΤ

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

ES440/ES911: CFD. Chapter 5. Solution of Linear Equation Systems

Yoshifumi Moriyama 1,a) Ichiro Iimura 2,b) Tomotsugu Ohno 1,c) Shigeru Nakayama 3,d)

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

Dynamic types, Lambda calculus machines Section and Practice Problems Apr 21 22, 2016

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

Διπλωματική Εργασία του φοιτητή του Τμήματος Ηλεκτρολόγων Μηχανικών και Τεχνολογίας Υπολογιστών της Πολυτεχνικής Σχολής του Πανεπιστημίου Πατρών

Maude 6. Maude [1] UIUC J. Meseguer. Maude. Maude SRI SRI. Maude. AC (Associative-Commutative) Maude. Maude Meseguer OBJ LTL SPIN

* ** *** *** Jun S HIMADA*, Kyoko O HSUMI**, Kazuhiko O HBA*** and Atsushi M ARUYAMA***

Schedulability Analysis Algorithm for Timing Constraint Workflow Models

Other Test Constructions: Likelihood Ratio & Bayes Tests

Conjoint. The Problems of Price Attribute by Conjoint Analysis. Akihiko SHIMAZAKI * Nobuyuki OTAKE

Research on Economics and Management

Αρχιτεκτονική Σχεδίαση Ασαφούς Ελεγκτή σε VHDL και Υλοποίηση σε FPGA ΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

Development of a basic motion analysis system using a sensor KINECT

ΜΑΘΗΜΑΤΙΚΗ ΠΡΟΣΟΜΟΙΩΣΗ ΤΗΣ ΔΥΝΑΜΙΚΗΣ ΤΟΥ ΕΔΑΦΙΚΟΥ ΝΕΡΟΥ ΣΤΗΝ ΠΕΡΙΠΤΩΣΗ ΑΡΔΕΥΣΗΣ ΜΕ ΥΠΟΓΕΙΟΥΣ ΣΤΑΛΑΚΤΗΦΟΡΟΥΣ ΣΩΛΗΝΕΣ ΣΕ ΔΙΑΣΤΡΩΜΕΝΑ ΕΔΑΦΗ

A Sequential Experimental Design based on Bayesian Statistics for Online Automatic Tuning. Reiji SUDA,


Appendix to On the stability of a compressible axisymmetric rotating flow in a pipe. By Z. Rusak & J. H. Lee

MathCity.org Merging man and maths

The Simply Typed Lambda Calculus

CRASH COURSE IN PRECALCULUS

1 h, , CaCl 2. pelamis) 58.1%, (Headspace solid -phase microextraction and gas chromatography -mass spectrometry,hs -SPME - Vol. 15 No.

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

Detection and Recognition of Traffic Signal Using Machine Learning

Ι ΑΚΤΟΡΙΚΗ ΙΑΤΡΙΒΗ. Χρήστος Αθ. Χριστοδούλου. Επιβλέπων: Καθηγητής Ιωάννης Αθ. Σταθόπουλος

ΓΕΩΜΕΣΡΙΚΗ ΣΕΚΜΗΡΙΩΗ ΣΟΤ ΙΕΡΟΤ ΝΑΟΤ ΣΟΤ ΣΙΜΙΟΤ ΣΑΤΡΟΤ ΣΟ ΠΕΛΕΝΔΡΙ ΣΗ ΚΤΠΡΟΤ ΜΕ ΕΦΑΡΜΟΓΗ ΑΤΣΟΜΑΣΟΠΟΙΗΜΕΝΟΤ ΤΣΗΜΑΣΟ ΨΗΦΙΑΚΗ ΦΩΣΟΓΡΑΜΜΕΣΡΙΑ

ΚΒΑΝΤΙΚΟΙ ΥΠΟΛΟΓΙΣΤΕΣ

2 Composition. Invertible Mappings

A Method for Creating Shortcut Links by Considering Popularity of Contents in Structured P2P Networks

Jesse Maassen and Mark Lundstrom Purdue University November 25, 2013

Wiki. Wiki. Analysis of user activity of closed Wiki used by small groups

Optimization, PSO) DE [1, 2, 3, 4] PSO [5, 6, 7, 8, 9, 10, 11] (P)

Archive of SID. (Tg)

ΑΠΟΔΟΤΙΚΗ ΑΠΟΤΙΜΗΣΗ ΕΡΩΤΗΣΕΩΝ OLAP Η ΜΕΤΑΠΤΥΧΙΑΚΗ ΕΡΓΑΣΙΑ ΕΞΕΙΔΙΚΕΥΣΗΣ. Υποβάλλεται στην

Buried Markov Model Pairwise

ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ ΕΠΑΝΑΣΧΕΔΙΑΣΜΟΣ ΓΡΑΜΜΗΣ ΣΥΝΑΡΜΟΛΟΓΗΣΗΣ ΜΕ ΧΡΗΣΗ ΕΡΓΑΛΕΙΩΝ ΛΙΤΗΣ ΠΑΡΑΓΩΓΗΣ REDESIGNING AN ASSEMBLY LINE WITH LEAN PRODUCTION TOOLS

Binary32 (a hi ) 8 bits 23 bits Binary32 (a lo ) 8 bits 23 bits Double-Float (a=a hi +a lo, a lo 0.5ulp(a hi ) ) 8 bits 46 bits Binary64 11 bits sign

Applying Markov Decision Processes to Role-playing Game

Development of the Nursing Program for Rehabilitation of Woman Diagnosed with Breast Cancer

Newman Modularity Newman [4], [5] Newman Q Q Q greedy algorithm[6] Newman Newman Q 1 Tabu Search[7] Newman Newman Newman Q Newman 1 2 Newman 3

Ενημερωτική εκδήλωση για τις ερευνητικές υποδομές Δημήτρης Δενιόζος Γενική Γραμματεία Δημοσίων Επενδύσεων και ΕΣΠΑ

ΕΙΣΑΓΩΓΗ ΣΤΗ ΣΤΑΤΙΣΤΙΚΗ ΑΝΑΛΥΣΗ

ΠΙΛΟΤΙΚΗ ΕΦΑΡΜΟΓΗ ΑΥΤΟΝΟΜΩΝ ΣΥΣΤΗΜΑΤΩΝ ΠΛΟΗΓΗΣΗΣ ΓΙΑ ΤΗΝ ΠΑΡΑΓΩΓΗ ΥΨΗΛΗΣ ΑΝΑΛΥΣΗΣ ΟΡΘΟΦΩΤΟΓΡΑΦΙΩΝ ΓΕΩΡΓΙΚΩΝ ΕΚΤΑΣΕΩΝ

Fourier transform, STFT 5. Continuous wavelet transform, CWT STFT STFT STFT STFT [1] CWT CWT CWT STFT [2 5] CWT STFT STFT CWT CWT. Griffin [8] CWT CWT

MOTROL. COMMISSION OF MOTORIZATION AND ENERGETICS IN AGRICULTURE 2014, Vol. 16, No. 5,

Πτυχιακή Εργασι α «Εκτι μήσή τής ποιο τήτας εικο νων με τήν χρή σή τεχνήτων νευρωνικων δικτυ ων»

ST5224: Advanced Statistical Theory II

Study of In-vehicle Sound Field Creation by Simultaneous Equation Method

ΜΕΘΟΔΟΙ ΑΕΡΟΔΥΝΑΜΙΚΗΣ

High order interpolation function for surface contact problem

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ.

Filter Diagonalization Method which Constructs an Approximation of Orthonormal Basis of the Invariant Subspace from the Filtered Vectors

1 (forward modeling) 2 (data-driven modeling) e- Quest EnergyPlus DeST 1.1. {X t } ARMA. S.Sp. Pappas [4]

Matrices and Determinants

Quantitative chemical analyses of rocks with X-ray fluorescence analyzer: major and trace elements in ultrabasic rocks

The ε-pseudospectrum of a Matrix

EE512: Error Control Coding

ΑΛΛΗΛΕΠΙ ΡΑΣΗ ΜΟΡΦΩΝ ΛΥΓΙΣΜΟΥ ΣΤΙΣ ΜΕΤΑΛΛΙΚΕΣ ΚΑΤΑΣΚΕΥΕΣ

Bayesian modeling of inseparable space-time variation in disease risk

ΤΕΧΝΙΚΕΣ ΑΥΞΗΣΗΣ ΤΗΣ ΑΠΟΔΟΣΗΣ ΤΩΝ ΥΠΟΛΟΓΙΣΤΩΝ I

Study on Re-adhesion control by monitoring excessive angular momentum in electric railway traction

ΑΥΤΟΜΑΤΟΠΟΙΗΣΗ ΜΟΝΑΔΑΣ ΘΡΑΥΣΤΗΡΑ ΜΕ ΧΡΗΣΗ P.L.C. AUTOMATION OF A CRUSHER MODULE USING P.L.C.

CUDA FFT. High Performance 3-D FFT in CUDA Environment. Akira Nukada, 1, 2 Yasuhiko Ogata, 1, 2 Toshio Endo 1, 2 and Satoshi Matsuoka 1, 2, 3

[1] P Q. Fig. 3.1

Resurvey of Possible Seismic Fissures in the Old-Edo River in Tokyo

2 ~ 8 Hz Hz. Blondet 1 Trombetti 2-4 Symans 5. = - M p. M p. s 2 x p. s 2 x t x t. + C p. sx p. + K p. x p. C p. s 2. x tp x t.

Retrieval of Seismic Data Recorded on Open-reel-type Magnetic Tapes (MT) by Using Existing Devices


CORDIC Background (4A)

Stabilization of stock price prediction by cross entropy optimization

ΠΣΤΥΙΑΚΗ ΔΡΓΑΙΑ. Μειέηε Υξόλνπ Απνζηείξσζεο Κνλζέξβαο κε Τπνινγηζηηθή Ρεπζηνδπλακηθή. Αζαλαζηάδνπ Βαξβάξα

New Adaptive Projection Technique for Krylov Subspace Method

ΠΤΥΧΙΑΚΗ ΕΡΓΑΣΙΑ "ΠΟΛΥΚΡΙΤΗΡΙΑ ΣΥΣΤΗΜΑΤΑ ΛΗΨΗΣ ΑΠΟΦΑΣΕΩΝ. Η ΠΕΡΙΠΤΩΣΗ ΤΗΣ ΕΠΙΛΟΓΗΣ ΑΣΦΑΛΙΣΤΗΡΙΟΥ ΣΥΜΒΟΛΑΙΟΥ ΥΓΕΙΑΣ "

Queensland University of Technology Transport Data Analysis and Modeling Methodologies

ΠΣΤΧΛΑΚΘ ΕΡΓΑΛΑ ΜΕΣΡΘΕΛ ΟΠΣΛΚΟΤ ΒΑΚΟΤ ΑΣΜΟΦΑΛΡΑ ΜΕ ΘΛΛΑΚΟ ΦΩΣΟΜΕΣΡΟ ΕΚΟ

Αναερόβια Φυσική Κατάσταση

CHAPTER 48 APPLICATIONS OF MATRICES AND DETERMINANTS

Εκτεταμένη περίληψη Περίληψη

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

Study of urban housing development projects: The general planning of Alexandria City

Ειδικό πρόγραμμα ελέγχου για τον ιό του Δυτικού Νείλου και την ελονοσία, ενίσχυση της επιτήρησης στην ελληνική επικράτεια (MIS )

Kenta OKU and Fumio HATTORI

Transcript:

Vol. 46 No. SIG 7(ACS 10) May 2005 DGKS PC 10 8 10 14 4.8 Orthogonalization Library with a Numerical Computation Policy Interface Ken Naono, Mitsuyoshi Igai and Hiroyuki Kidachi We propose an orthogonalization library that can execute faster computations under the conditions of required accuracy to control the balance between the orthogonal accuracy of the obtained vectors and the computation time. We design the library of the orthogonalization by race-based method, which execute in the same time the classical Gram-Schmidt, the modified Gram-Schmidt, and the DGKS type Gram-Schmidt, and take the fastest result that satisfies the required accuracy. The experiments of the method for three different types of vectors on PC (Pentium4, 3.2 GHz) show that, the method can achieves better accuracy even in the case that the single orthogonalization fails in over 10 8 orthogonality, and that the method achieves about 4.8 times faster in the best case. 1. PC PC GRID 3 1 Central Research Laboratory, Hitachi, Ltd. LSI Hitachi ULSI Systems Co., Ltd. 1) 6) 2 N 1000 2000 3000 N 7),8) 9) 10) 3 35

36 May 2005 3 2. 2.1 1 v j (j =1,,JMAX) JMAX v j i V (i, j); i =1,..., N N 1 do M = 1, JMAX call single_orthogonal(v(1,m), V, M) 1 single orthogonal 1 3 (1) Classical Gram- Schmidt CGS (2) Modified Gram- Schmidt MGS ( 3 ) Daniel-Gragg-Kaufman-Stewart DGKS 11) 3 2.1.1 CGS 2 CGS 2 BLAS2 2 s 2 subroutine single_orthogonal(w,v,m) do j = 1, M-1 s = 0.0d0 do i = 1, N s = s + V(i, j)*w(i) hr(j) = s do i = 1, N s = 0.0d0 do j = 1, M-1 s = s + V(i, j)*hr(j) w(i) = w(i) s 2.1.2 MGS 3 MGS s w(i) = w(i) s*v(i, j) BLAS BLAS1 3 subroutine single_orthogonal(w,v,m) do j=1, M-1 s = 0.0d0 do i =1, N

Vol. 46 No. SIG 7(ACS 10) 37 s = s + V(i, j)*w(i) if (s.ne. 0.0d0) then do i=1, N w(i) = w(i) s*v(i, j) endif 2.1.3 Daniel-Gragg-Kaufman-Stewart DGKS 4 DGKS CGS (1) CGS (1) CGS 4 CGS (1) CGS w 2 <η Vw 2 (1) Vw 2 hr(i) 3 12) η = 1 2 2.2 1 3 MGS CGS 12) MGS CGS ARPACK 13) DGKS 1 3. PC 3 3.1 3 v j (j =1,,JMAX) v j (i) x(k) cos 3 v j i 1 v j (i) =x(k) j + cos( i j )+i 0.01 N +1 2 v j (i) =x(k)+i j 0.01 3 v j (i) =x(k)+cos( i j N +1 ) i =1,..., N k = i +(j 1) N JMAX 20 100 (2) Ortho Ortho = V t V I F (2) I F V j v j

38 May 2005 1 1 CGS MGS DGKS Fig. 1 Execution time and orthogonality error of CGS, MGS and DGKS for Sample 1. JMAX 3.2 N 10000 100000 100 Hitachi FLORA370DG CPU Pentium 4 HT 3.2 GHz L1 8KB L2 512 KB512 MB DDR-SDRAM Intel FORTRAN Compiler Version 7.1 -O3 -prefetch -unroll -align -lowercase -nodps -fpp2 -tpp7 -xw -vec_report3 -opt_report 3.2.1 1 1 1 CGS MGS DGKS 1(1) (3) N = 10000 30000 MGS CGS DGKS 30000 CGS MGS DGKS DGKS CGS 2 1(2) (4) CGS 10 8 MGS DGKS 10 13 JMAX = 20 CGS N 3 3.2.2 2 2 2 CGS MGS DGKS 2 (1) (3) 1 N = 10000 30000 MGS N = 30000 CGS N 2 (2) (4) CGS N 10 MGS DGKS

Vol. 46 No. SIG 7(ACS 10) 39 2 2 CGS MGS DGKS Fig. 2 Execution time and orthogonality error of CGS, MGS and DGKS for Sample 2. 3 3 CGS MGS DGKS Fig. 3 Execution time and orthogonality error of CGS, MGS and DGKS for Sample 3. 3.2.3 3 3 3 CGS MGS DGKS 3 (1) (3) 1 2 N = 10000 30000 MGS N = 30000 CGS N 3 (2) (4) CGS MGS DGKS CGS N 3 10000 20000

40 May 2005 MGS 40000 80000 CGS < MGS < DGKS CGS CGS > MGS > DGKS DGKS 4. 4.1 Policy(ε) Ortho ε Policy(ε) ε 4.2 Policy(ε) 10) ε 4 CGS MGS DGKS 3 4 (1) ε (2) 3 (a) CGSMGS DGKS (b) 3 (c) Ortho ε 4 Policy(ε) Fig. 4 Processing flow of orthogonalization library with numerical policy Policy(ɛ) using race-based method. ε Ortho Ortho ε Ortho (3) 4.3 JMAX = 100 N = 10000 20000 40000 80000 4 ε =10 8 10 10 10 12 3 3 4 3 3.2 1 1 3 5 6 8 CGS MGS DGKS NG

Vol. 46 No. SIG 7(ACS 10) 41 1 1 JMAX = 100 Table 1 Accuracy and execution time of selected algorithm and of other algorithms for Sample 1 data (JMAX = 100). (Ortho) (sec) CGS MGS DGKS 10000 1.00E-08 MGS 7.70E-11 0.2247 NG(7.43E-06) 4.331 1.00E-10 MGS 7.70E-11 0.2247 NG(7.43E-06) 4.331 1.00E-12 DGKS 2.63E-14 0.9731 NG(7.43E-06) NG(7.70E-11) 20000 1.00E-08 MGS 2.17E-10 0.5695 NG(4.49E-05) 3.644 1.00E-10 DGKS 3.53E-14 2.0752 NG(4.49E-05) NG(2.17E-10) 1.00E-12 DGKS 3.53E-14 2.0752 NG(4.49E-05) NG(2.17E-10) 40000 1.00E-08 MGS 6.34E-10 2.3134 NG(1.86E-04) 1.809 1.00E-10 DGKS 5.94E-14 4.1852 NG(1.86E-04) NG(6.34E-10) 1.00E-12 DGKS 5.94E-14 4.1852 NG(1.86E-04) NG(6.34E-10) 80000 1.00E-08 MGS 1.67E-09 5.3774 NG(1.08E-03) 1.552 1.00E-10 DGKS 7.45E-14 8.3461 NG(1.08E-03) NG(1.67E-09) 1.00E-12 DGKS 7.45E-14 8.3461 NG(1.08E-03) NG(1.67E-09) 2 2 JMAX = 100 Table 2 Accuracy and execution time of selected algorithm and of other algorithms for Sample 2 data (JMAX = 100). (Ortho) (sec) CGS MGS DGKS 10000 1.00E-08 MGS 8.08E-14 0.2344 2.130 4.188 1.00E-10 MGS 8.08E-14 0.2344 2.130 4.188 1.00E-12 MGS 8.08E-14 0.2344 2.130 4.188 20000 1.00E-08 MGS 2.29E-13 0.5801 1.782 3.533 1.00E-10 MGS 2.29E-13 0.5801 1.782 3.533 1.00E-12 MGS 2.29E-13 0.5801 1.782 3.533 40000 1.00E-08 CGS 8.05E-10 2.0653 1.117 1.985 1.00E-10 MGS 6.67E-13 2.3077 NG(8.05E-10) 1.776 1.00E-12 MGS 6.67E-13 2.3077 NG(8.05E-10) 1.776 80000 1.00E-08 CGS 3.22E-09 4.0741 1.320 1.978 1.00E-10 MGS 3.22E-12 5.3763 NG(3.22E-09) 1.499 1.00E-12 DGKS 8.00E-14 8.0574 NG(3.22E-09) NG(3.22E-12) 3 3 JMAX = 100 Table 3 Accuracy and execution time of selected algorithm and of other algorithms for Sample 3 data (JMAX = 100). (Ortho) (sec) CGS MGS DGKS 10000 1.00E-08 MGS 4.61E-14 0.2044 2.446 4.817 1.00E-10 MGS 4.61E-14 0.2044 2.446 4.817 1.00E-12 MGS 4.61E-14 0.2044 2.446 4.817 20000 1.00E-08 MGS 7.34E-14 0.5163 2.051 4.023 1.00E-10 MGS 7.34E-14 0.5163 2.051 4.023 1.00E-12 MGS 7.34E-14 0.5163 2.051 4.023 40000 1.00E-08 CGS 2.61E-12 2.0995 1.064 1.973 1.00E-10 CGS 2.61E-12 2.0995 1.064 1.973 1.00E-12 MGS 9.72E-14 2.2348 NG(2.61E-12) 1.853 80000 1.00E-08 CGS 5.99E-12 4.1279 1.271 1.979 1.00E-10 CGS 5.99E-12 4.1279 1.271 1.979 1.00E-12 MGS 1.30E-13 5.2472 NG(5.99E-12) 1.557-1 2 3 3 1 2 3 1 1 CGS

42 May 2005 N = 10000 20000 MGS DGKS N = 10000 MGS DGKS 4.3 2 3 10000 20000 MGS DGKS 4.8 10 14 2 80000 CGS MGS DGKS 3 3 DGKS CGS 10 10 DGKS 2.0 5. GRID 3 1 2) 4) 8) Dongarra SANS Self- Adapting Numerical Software 5) GRID ATLAS 6) PC 2 3 3 6. 6.1 CGS MGS DGKS 3 PC Pentium4 3.2 GHz HT 3 10000 20000 MGS 40000 80000 CGS < MGS < DGKS CGS CGS > MGS > DGKS DGKS ε Ortho ε CGS MGS DGKS 3 MGS 10 10 10 12 DGKS CGS 10 10 DGKS 4.8 CGS MGS DGKS 3 6.2 PC 1CPU

Vol. 46 No. SIG 7(ACS 10) 43 1) 2001-HPC-87 (SWoPP2001), pp.25 30 (2001). 2) Kuroda, H., Katagiri, T. and Kanada, Y.: Knowledge Discovery in Auto-tuning Parallel Numerical Library, Progress in Discovery Science, Final Report of the Japanese Discovery Science Project, LNCS 2281, pp.628 639, Springer (2002). 3) ABC- LIB http://www.abc-lib.org/ 4) Katagiri, T., Kise, K., Honda, H. and Yuba, T.: FIBER: A General Framework for Auto- Tuning Software, 5th International Symposium on High Performance Computing (ISHPC-V ), LNCS 2858, pp.146 159, Springer (2003). 5) Dongarra, J.J. and Eijkhout, V.: Self-adapting numerical software for next generation applications, International Journal of High Performance Computing Applications, Vol.17, No.2, pp.125 131 (Summer 2003). 6) ATLAS project. http://www.netlib.org/atlas/index.html 7) 2002- HPC-91 (SWoPP2002), pp.49 54 (2002). 8) SACSIS (Symposium on Advanced Computing Systems and Infrastructures) 2003, pp.145 152 (2003). 9) GRID Vol.45, No.SIG6 (ACS6), pp.105 112 (2004). 10) 2004-HPC-99 (SWoPP2004), pp.97 102 (2004). 11) Daniel, J., Gragg, W.B., Kaufman, L. and Stewart, G.W.: Reorthogonalization and stable algorithms for updating the Gram-Schmidt QR factorization, Mathematics of Computation, Vol.30, pp.772 795 (1976). 12) Lehoucq, R.: Analysis and Implementation of an Implicitly Restarted Arnoldi Iteration, Ph.D. thesis, Rice University, Houston, TX (1995). 13) ARPACK Homepage. http://www.caam.rice.edu/software/arpack/ ( 16 9 28 ) ( 17 1 17 ) 1968 1994 SR2201 SR8000 SR11000 HPC HPC 1963 1987 LSI 1977 2000 LSI