YOU Wen-jie 1 2 JI Guo-li 1 YUAN Ming-shun 2

Σχετικά έγγραφα
The Research on Sampling Estimation of Seasonal Index Based on Stratified Random Sampling

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

46 2. Coula Coula Coula [7], Coula. Coula C(u, v) = φ [ ] {φ(u) + φ(v)}, u, v [, ]. (2.) φ( ) (generator), : [, ], ; φ() = ;, φ ( ). φ [ ] ( ) φ( ) []

Queensland University of Technology Transport Data Analysis and Modeling Methodologies

Bayesian Discriminant Feature Selection

Optimizing Microwave-assisted Extraction Process for Paprika Red Pigments Using Response Surface Methodology

ER-Tree (Extended R*-Tree)

Web-based supplementary materials for Bayesian Quantile Regression for Ordinal Longitudinal Data

Φαινομενολογία και γενετική ταξινόμηση της πρωτοπαθούς δυστονίας - Νεώτερα δεδομένα

Quick algorithm f or computing core attribute

CorV CVAC. CorV TU317. 1

Research on Economics and Management

High order interpolation function for surface contact problem

Prey-Taxis Holling-Tanner

Apr Vol.26 No.2. Pure and Applied Mathematics O157.5 A (2010) (d(u)d(v)) α, 1, (1969-),,.

N. P. Mozhey Belarusian State University of Informatics and Radioelectronics NORMAL CONNECTIONS ON SYMMETRIC MANIFOLDS

Technical Research Report, Earthquake Research Institute, the University of Tokyo, No. +-, pp. 0 +3,,**1. No ,**1

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review

1 (forward modeling) 2 (data-driven modeling) e- Quest EnergyPlus DeST 1.1. {X t } ARMA. S.Sp. Pappas [4]

Vol. 31,No JOURNAL OF CHINA UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb

J. of Math. (PRC) Banach, , X = N(T ) R(T + ), Y = R(T ) N(T + ). Vol. 37 ( 2017 ) No. 5

ΕΘΝΙΚΗ ΣΧΟΛΗ ΔΗΜΟΣΙΑΣ ΔΙΟΙΚΗΣΗΣ ΙΓ' ΕΚΠΑΙΔΕΥΤΙΚΗ ΣΕΙΡΑ

, -.

Feasible Regions Defined by Stability Constraints Based on the Argument Principle

, Litrrow. Maxwell. Helmholtz Fredholm, . 40 Maystre [4 ], Goray [5 ], Kleemann [6 ] PACC: 4210, 4110H

ΓΗΠΛΧΜΑΣΗΚΖ ΔΡΓΑΗΑ ΑΡΥΗΣΔΚΣΟΝΗΚΖ ΣΧΝ ΓΔΦΤΡΧΝ ΑΠΟ ΑΠΟΦΖ ΜΟΡΦΟΛΟΓΗΑ ΚΑΗ ΑΗΘΖΣΗΚΖ


Numerical Analysis FMN011

A summation formula ramified with hypergeometric function and involving recurrence relation

:JEL. F 15, F 13, C 51, C 33, C 13

2 ~ 8 Hz Hz. Blondet 1 Trombetti 2-4 Symans 5. = - M p. M p. s 2 x p. s 2 x t x t. + C p. sx p. + K p. x p. C p. s 2. x tp x t.

Ανάλυση Βροχομετρικών παρατηρήσεων Εξατμισοδιαπνοή

Arbitrage Analysis of Futures Market with Frictions

Error ana lysis of P2wave non2hyperbolic m oveout veloc ity in layered media

ΠΩΣ ΕΠΗΡΕΑΖΕΙ Η ΜΕΡΑ ΤΗΣ ΕΒΔΟΜΑΔΑΣ ΤΙΣ ΑΠΟΔΟΣΕΙΣ ΤΩΝ ΜΕΤΟΧΩΝ ΠΡΙΝ ΚΑΙ ΜΕΤΑ ΤΗΝ ΟΙΚΟΝΟΜΙΚΗ ΚΡΙΣΗ

ACTA MATHEMATICAE APPLICATAE SINICA Nov., ( µ ) ( (

Nov Journal of Zhengzhou University Engineering Science Vol. 36 No FCM. A doi /j. issn

Αλγοριθµική και νοηµατική µάθηση της χηµείας: η περίπτωση των πανελλαδικών εξετάσεων γενικής παιδείας 1999

Professional Tourism Education EΠΑΓΓΕΛΜΑΤΙΚΗ ΤΟΥΡΙΣΤΙΚΗ ΕΚΠΑΙΔΕΥΣΗ. Ministry of Tourism-Υπουργείο Τουρισμού

Application of Statistical Process Control in Pretreatment Production Process of Gardenia jasminoides

ΖΩΝΟΠΟΙΗΣΗ ΤΗΣ ΚΑΤΟΛΙΣΘΗΤΙΚΗΣ ΕΠΙΚΙΝΔΥΝΟΤΗΤΑΣ ΣΤΟ ΟΡΟΣ ΠΗΛΙΟ ΜΕ ΤΗ ΣΥΜΒΟΛΗ ΔΕΔΟΜΕΝΩΝ ΣΥΜΒΟΛΟΜΕΤΡΙΑΣ ΜΟΝΙΜΩΝ ΣΚΕΔΑΣΤΩΝ

ΠΟΛΥΤΕΧΝΕΙΟ ΚΡΗΤΗΣ ΣΧΟΛΗ ΜΗΧΑΝΙΚΩΝ ΠΕΡΙΒΑΛΛΟΝΤΟΣ

Matrices and vectors. Matrix and vector. a 11 a 12 a 1n a 21 a 22 a 2n A = b 1 b 2. b m. R m n, b = = ( a ij. a m1 a m2 a mn. def

[1] DNA ATM [2] c 2013 Information Processing Society of Japan. Gait motion descriptors. Osaka University 2. Drexel University a)

Πέτρος Γ. Οικονομίδης Πρόεδρος και Εκτελεστικός Διευθυντής

Supplementary Materials for Evolutionary Multiobjective Optimization Based Multimodal Optimization: Fitness Landscape Approximation and Peak Detection


Conjoint. The Problems of Price Attribute by Conjoint Analysis. Akihiko SHIMAZAKI * Nobuyuki OTAKE

SVM. Research on ERPs feature extraction and classification

"ΦΟΡΟΛΟΓΙΑ ΕΙΣΟΔΗΜΑΤΟΣ ΕΤΑΙΡΕΙΩΝ ΣΥΓΚΡΙΤΙΚΑ ΓΙΑ ΤΑ ΟΙΚΟΝΟΜΙΚΑ ΕΤΗ "

DiracDelta. Notations. Primary definition. Specific values. General characteristics. Traditional name. Traditional notation

Study of urban housing development projects: The general planning of Alexandria City

Reading Order Detection for Text Layout Excluded by Image

ΣΤΥΛΙΑΝΟΥ ΣΟΦΙΑ

Αριστοτέλειο Πανεπιστήμιο Θεσσαλονίκης Τμήμα Μαθηματικών Π.Μ.Σ. Θεωρητικής Πληροφορικής και Θεωρίας Συστημάτων και Ελέγχου

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

Schedulability Analysis Algorithm for Timing Constraint Workflow Models

Ανάλυση εισοδήματος των μισθωτών και παράγοντες που το επηρεάζουν

Study on the Strengthen Method of Masonry Structure by Steel Truss for Collapse Prevention

Study of In-vehicle Sound Field Creation by Simultaneous Equation Method

Buried Markov Model Pairwise

A research on the influence of dummy activity on float in an AOA network and its amendments

ΕΥΘΑΛΙΑ ΚΑΜΠΟΥΡΟΠΟΥΛΟΥ

ADVANCED STRUCTURAL MECHANICS

MIA MONTE CARLO ΜΕΛΕΤΗ ΤΩΝ ΕΚΤΙΜΗΤΩΝ RIDGE ΚΑΙ ΕΛΑΧΙΣΤΩΝ ΤΕΤΡΑΓΩΝΩΝ

Supporting Information. Research Center for Marine Drugs, Department of Pharmacy, State Key Laboratory

Ηλεκτρικές δοκιµές σε καλώδια µέσης τάσης - ιαδικασίες επαλήθευσης και υπολογισµού αβεβαιότητας ΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ

þÿ Ç»¹º ³µÃ ± : Ãż²» Ä Â

SCITECH Volume 13, Issue 2 RESEARCH ORGANISATION Published online: March 29, 2018

ΔΗΜΟΤΙΚΕΣ ΕΚΛΟΓΕΣ 18/5/2014 ΑΚΥΡΑ

ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ

Η Επίδραση των Events στην Απόδοση των Μετοχών


Ανάλυση Προτιμήσεων για τη Χρήση Συστήματος Κοινόχρηστων Ποδηλάτων στην Αθήνα

Development of a Tiltmeter with a XY Magnetic Detector (Part +)

Development of a Seismic Data Analysis System for a Short-term Training for Researchers from Developing Countries

A Hierarchy of Theta Bodies for Polynomial Systems

Second Order Partial Differential Equations

(FENXI HUAXUE) Chinese Journal of Analytical Chemistry. Boosting

Πτυχιακή Εργασι α «Εκτι μήσή τής ποιο τήτας εικο νων με τήν χρή σή τεχνήτων νευρωνικων δικτυ ων»

Ελαφρές κυψελωτές πλάκες - ένα νέο προϊόν για την επιπλοποιία και ξυλουργική. ΒΑΣΙΛΕΙΟΥ ΒΑΣΙΛΕΙΟΣ και ΜΠΑΡΜΠΟΥΤΗΣ ΙΩΑΝΝΗΣ

ES440/ES911: CFD. Chapter 5. Solution of Linear Equation Systems

Homework 8 Model Solution Section

Multilevel models for analyzing people s daily moving behaviour

Development of the Nursing Program for Rehabilitation of Woman Diagnosed with Breast Cancer

Research on model of early2warning of enterprise crisis based on entropy

( ) , ) , ; kg 1) 80 % kg. Vol. 28,No. 1 Jan.,2006 RESOURCES SCIENCE : (2006) ,2 ,,,, ; ;

EM Baum-Welch. Step by Step the Baum-Welch Algorithm and its Application 2. HMM Baum-Welch. Baum-Welch. Baum-Welch Baum-Welch.

Multifunctinality and Crystal Dynamics of Highly Stable Porous Metal-Organic Framework [Zn 4 O(NTB) 2 ]

Medium Data on Big Data

Χαλκίδης Νέστωρας, Τσαγιοπούλου Μαρία, Παπακωνσταντίνου Νίκος, Μωυσιάδης Θεόδωρος. Αριστοτέλειο Πανεπιστήμιο Θεσσαλονίκης 2016

EE512: Error Control Coding

Supplementary Appendix

Pyrrolo[2,3-d:5,4-d']bisthiazoles: Alternate Synthetic Routes and a Comparative Study to Analogous Fused-ring Bithiophenes

Εκπαιδευτικές πολιτιστικές πρακτικές των γονέων και κοινωνική προέλευση

Congruence Classes of Invertible Matrices of Order 3 over F 2

Chapter 1 Introduction to Observational Studies Part 2 Cross-Sectional Selection Bias Adjustment

CHAPTER 12: PERIMETER, AREA, CIRCUMFERENCE, AND 12.1 INTRODUCTION TO GEOMETRIC 12.2 PERIMETER: SQUARES, RECTANGLES,

90 [, ] p Panel nested error structure) : Lagrange-multiple LM) Honda [3] LM ; King Wu, Baltagi, Chang Li [4] Moulton Randolph ANOVA) F p Panel,, p Z

ΤΕΧΝΟΛΟΓΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΥΠΡΟΥ ΣΧΟΛΗ ΜΗΧΑΝΙΚΗΣ ΚΑΙ ΤΕΧΝΟΛΟΓΙΑΣ. Πτυχιακή εργασία

Transcript:

Couter Engineering and Alications 29,4(36) 16 1 2 1 2 YOU Wen-jie 1 2 JI Guo-li 1 YUAN Ming-shun 2 1. 36 2. 33 1.Deartent of Autoation Xiaen University Xiaen Fujian 36 China 2.Fuqing Branch Fujian Noral University Fuqing Fujian 33 China E-ail glji@xu.edu.cn YOU Wen-jie JI Guo-li YUAN Ming-shun.Feature reduction on high-diensional sall-sale data.couter Engineering and Alications 29 4 36 16-169. Abstract In view of the characteristics of sall sale and high diensional data Generalized Sall Sales GSS is defined. It reduces inforation feature of GSS feature extraction diensionality extraction and feature selection diensionality selection. Firstly unsuervised feature extraction based on Princial Coonent Analysis PCA and suervised feature extraction based on Partial Least Squares PLS are introduced.secondly analyzing the structure of first PC it resents new global PCA-based and PLSbased feature selection aroaches in addition recursive feature eliination on PLS PLS-RFE is realized.finally the aroaches are alied to the classification of MIT AML/ALL it erfors feature extraction on PCA and PLS and feature selection coared with PLS-RFE.The inforation coression of GSS is realized. Key words generalized sall sale Princial Coonent Analysis PCA Partial Least Squares PLS feature extraction feature selection PCA PLS PCA PLS PLS PLS-RFE MIT AML/ ALL PCA PLS PLS-RFE PCA PLS DOI.3778/j.issn.2-8331.29.36.49 2-8331 29 36-16- A TP391 1 n 2 9 DNA Ranking t- [1-4] No.273843 No.JB8244 1974-196- 1979-29-8-24 29--9

166 29,4(36) Couter Engineering and Alications Y t 1 t 2 PCA n X=[X 1 X 2 X ] Y PLS PCA PLS ax cov Xw i Yc i RFE [-6] PLS s.t. w i w i =1 c i c i =1 PLS-RFE PCA w i X w j = PLS c i Y c j = t i =Xw i i X =X X Y =Y Y [9-] 2 w i c i 2.1 PCA w i = XY YX PCA I-P X XY I-P Y YX i> 1 X c i = YXw i X T I-P Y YX w i i> 1 P X = X W [ X W T X W ] -1 X W T P Y = Y C [ Y C T Y C ] -1 Y C T W= w ij C= c ij PLS t h X n X=[X 1 X 2 X ] Y Y t h X Y t h X T=XW ax var Xw i s.t. w i w i =1 w i X w j = 1i<j T=XW X =X X w i λ i I - w i = [7-8] Rd X t 1 t 2 t = Rd X t h t 1 t 2 t X X =X X λ i =var t λ i 1 λ 2 λ w i X =X X λ i w i W weighing X Rd x j t 1 t 2 t = Rd x j t h t 1 t 2 t x j λ i i w i i weighing < t h Y 1 λ k / λ i t k k=1 λ k / λ i t 1 t 2 t < Rd Y t 1 t 2 t = Rd Y t h t 1 t 2 t Y X 1 X 2 X Rd y 2.2 k t 1 t 2 t = Rd y k t h t 1 t 2 t y k PLS PLS Y 3 PLS X Y PLS X t 1 t 1 X Y u 1 t 1 u 1 Y t 1 X t 1 X t 1 r x i x j 2 t h X Rd x j t h =r x 2 j t h t h x j Rd X t h = 1 Rd x j t h t h X j=1 Rd y k t h =r y 2 k t h t h y k q Rd Y t h = 1 Rd y k t h t h Y q k=1

29,4(36) 167 X t 1 X t 1 t 1 w 1 3.1 1 PCA w 1 PCA 3.2.2 PLS 2 PLS 2.2 PLS 1 t 1 u 1 X Y 2 t 1 u 1 X t 1 X PLS t 1 X t 1 X w 1 3.1.1 PCA PCA 1 n >>n X w 1 PLS 2 X w i 2 3.2.3 PLS-RFE λ k / λ i 1-α α 1-α k=1.8 3 X w i 2 T= t ij =<X i w j > t ij X i 4 T j RFE PLS X PLS-RFE Recursive Feature Eliination 1 3.1.2 PLS Feature Ranking 2 PLS PLS 1 X n >>n Y k PLS-RFE [3] k k 2 4 PMPRESS 4.1 PMPRESS Prob>.1 nfac 3 nfac T = t ij = Acute Lyhoblastic Leukeia ALL <X i w j > t ij X i j Acute Myeloid Leukeia AML 4 T X 3.2 Golub [1] 7 129 38 27 ALL 11 AML Filter 34 2 ALL 14 AML Golub Wraer 38 34 29 PCA PLS PLS PLS-RFE 4.2 3.2.1 PCA SVMs Matlab 2.1 t 1 X SVMs OSU_SVM3. htt //www.kernelethods.net/ LinearSVC t 1 X ρ t 2 1 X =λ j 1 PCA/PLS j=1 3.2.2 PLS

168 29,4(36) Couter Engineering and Alications 1 X / % 1 / % 2 3 Y / % 1 2 3 4 6 7 8 9 1 2 3 4 6 7 8 9 / % / % X Y k k=2 3 PCA SVMs 1 2 PCA 3% PLS 23% 91% 1 4.2.1 7 129 PCA PLS 1 PCA/PLS 1 PCA PLS 7 129 4.2.2 2 1 SVMs 2.2 1 7 129 SVMs 2 1 2 3 PCA 3 1 PCA/PLS PCA PLS 2 3 4 6 7 8 9.82 9.82 9.823.82 9.82 9.7 9.764 7.764 7 1 2 2 2 2 4 4.868 4.97 6.911 8.911 8.911 8.911 8 12 11 12 11 14 11 16 11 17 11 MIT AML/ALL SVMs OSU_SVM3. PCA/PLS ( ) / PLS. -. 2 4 6 a PCA 7 129-7 - - 2 4 6 PCA PLS k SVMs b PLS 7 129 1 2 PCA 2 % 4% PLS 4% PCA/PLS 97.6% Nguyen [2-4] 3 PLS 1 PCA/PLS/PLS-RFE

29,4(36) 169 2 PCA/PLS k PCA PLS PLS-RFE / % / % / % / % / % / % 2 3 4 6 7 8 9 11 12 13 14 1 7 129 71.1 81.6 92.1 92.1 8.8 73. 47.1 61.8 73. 11 3 7 3 7 4 3 1 7 84.2 89. 76. 82.4 2 3 1 7 84.2 89. 76. 82.4 2 3 1 7 MIT AML/ALL SVMs OSU_SVM3. k k=2 3 1 2 LinearSVC PCA PLS 2 PLS-RFE 1 PLS PLS-RFE PCA % SVMs PLS RFE 2 PCA 13 PLS PLS-RFE % PLS PLS-RFE PCA PLS 9 9 PLS PLS-RFE Golub 4.3 SVMs MIT AML/ALL [1] Golub T R Sloni D K Taayo P et al.molecular classification of LOOCV k- k-fold CV holdout 3 k 4-fold PLS-RFE 6.41 X Y cancer Class discovery and class rediction by gene exression onitoring[j].science 1999 286 439 31-37. [2] Nguyen D V Rocke D M.Tuor classification by artial least % PLS squares using icroarray gene exression data [J].Bioinforatics LOOCV PLS PLS-RFE 22 18 1 39-. #66 Golub [1] [3] Nguyen D V Rocke D M.Multi-class cancer classification via artial least squares with gene exression rofiles [J].Bioinforatics [3] 22 18 9 1216-1226. 3 [4] Nguyen D V Rocke D M.On artial least squares diension re- duction for icroarray-based classification A siulation study[j]. Coutational Statistics & Data Analysis 24 46 9 47-42. 72.1 1 #66 [] Guyon I Weston J Barnhill S et al.gene selection for cancer classification using suort vector achines[j].machine Learning 2 46 PLS k 4-fold 6.9 38 34 9 2 13 389-422. 72.1 1 #66 [6]. PLS-RFE k 4-fold 6.41 [J]. C 26 36 1 86-96. 38 34 9 2 [7]. [M]. 2 26-277. % [8] Massey W F.Princial coonents regression in exloratory statistical research[j].journal of Aerican Statistical Association 196 6 234-246. PCA [9] Wold S Ruhe A Wold H et al.the collinearity roble in linear regression the artial least squares PLS aroach to generalized Y X inverses[j].journal of Statistics Coutation 1984 73-743. Y [] Lorber A Wangen L Kowalski B.A theoretical foundation for the X PLS PLS algorith[j].journal of Cheoetrics 1987 1 19-31.