Bayesian Discriminant Feature Selection

Σχετικά έγγραφα
Web-based supplementary materials for Bayesian Quantile Regression for Ordinal Longitudinal Data

Buried Markov Model Pairwise

Applying Markov Decision Processes to Role-playing Game

[4] 1.2 [5] Bayesian Approach min-max min-max [6] UCB(Upper Confidence Bound ) UCT [7] [1] ( ) Amazons[8] Lines of Action(LOA)[4] Winands [4] 1

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

Research on Economics and Management

Schedulability Analysis Algorithm for Timing Constraint Workflow Models

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

SVM. Research on ERPs feature extraction and classification

EM Baum-Welch. Step by Step the Baum-Welch Algorithm and its Application 2. HMM Baum-Welch. Baum-Welch. Baum-Welch Baum-Welch.

{takasu, Conditional Random Field

Stabilization of stock price prediction by cross entropy optimization

YOU Wen-jie 1 2 JI Guo-li 1 YUAN Ming-shun 2

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

Detection and Recognition of Traffic Signal Using Machine Learning

Kernel Methods and their Application for Image Understanding

Solution Series 9. i=1 x i and i=1 x i.

ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ ΣΧΟΛΗ ΗΛΕΚΤΡΟΛΟΓΩΝ ΜΗΧΑΝΙΚΩΝ ΚΑΙ ΜΗΧΑΝΙΚΩΝ ΥΠΟΛΟΓΙΣΤΩΝ

CSJ. Speaker clustering based on non-negative matrix factorization using i-vector-based speaker similarity

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review

ER-Tree (Extended R*-Tree)

, -.


Yoshifumi Moriyama 1,a) Ichiro Iimura 2,b) Tomotsugu Ohno 1,c) Shigeru Nakayama 3,d)

Vol. 31,No JOURNAL OF CHINA UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb

SCITECH Volume 13, Issue 2 RESEARCH ORGANISATION Published online: March 29, 2018

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

Homomorphism in Intuitionistic Fuzzy Automata

Supplementary Appendix

Nov Journal of Zhengzhou University Engineering Science Vol. 36 No FCM. A doi /j. issn

ΧΩΡΙΚΑ ΟΙΚΟΝΟΜΕΤΡΙΚΑ ΥΠΟΔΕΙΓΜΑΤΑ ΣΤΗΝ ΕΚΤΙΜΗΣΗ ΤΩΝ ΤΙΜΩΝ ΤΩΝ ΑΚΙΝΗΤΩΝ SPATIAL ECONOMETRIC MODELS FOR VALUATION OF THE PROPERTY PRICES

Automatic extraction of bibliography with machine learning

2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems

ΠΤΥΧΙΑΚΗ/ ΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

Διπλωματική Εργασία. του φοιτητή του Τμήματος Ηλεκτρολόγων Μηχανικών και Τεχνολογίας Υπολογιστών της Πολυτεχνικής Σχολής του Πανεπιστημίου Πατρών

Splice site recognition between different organisms

( ) , ) , ; kg 1) 80 % kg. Vol. 28,No. 1 Jan.,2006 RESOURCES SCIENCE : (2006) ,2 ,,,, ; ;

Ανάκτηση Εικόνας βάσει Υφής με χρήση Eye Tracker

Study on Re-adhesion control by monitoring excessive angular momentum in electric railway traction

Other Test Constructions: Likelihood Ratio & Bayes Tests

ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ

476,,. : 4. 7, MML. 4 6,.,. : ; Wishart ; MML Wishart ; CEM 2 ; ;,. 2. EM 2.1 Y = Y 1,, Y d T d, y = y 1,, y d T Y. k : p(y θ) = k α m p(y θ m ), (2.1

n 1 n 3 choice node (shelf) choice node (rough group) choice node (representative candidate)

ΠΟΛΥΤΕΧΝΕΙΟ ΚΡΗΤΗΣ ΣΧΟΛΗ ΜΗΧΑΝΙΚΩΝ ΠΕΡΙΒΑΛΛΟΝΤΟΣ

Quantitative chemical analyses of rocks with X-ray fluorescence analyzer: major and trace elements in ultrabasic rocks

Medium Data on Big Data

CorV CVAC. CorV TU317. 1

Η ΠΡΟΣΩΠΙΚΗ ΟΡΙΟΘΕΤΗΣΗ ΤΟΥ ΧΩΡΟΥ Η ΠΕΡΙΠΤΩΣΗ ΤΩΝ CHAT ROOMS

Architecture for Visualization Using Teacher Information based on SOM

46 2. Coula Coula Coula [7], Coula. Coula C(u, v) = φ [ ] {φ(u) + φ(v)}, u, v [, ]. (2.) φ( ) (generator), : [, ], ; φ() = ;, φ ( ). φ [ ] ( ) φ( ) []

Queensland University of Technology Transport Data Analysis and Modeling Methodologies

ΕΚΤΙΜΗΣΗ ΤΟΥ ΚΟΣΤΟΥΣ ΤΩΝ ΟΔΙΚΩΝ ΑΤΥΧΗΜΑΤΩΝ ΚΑΙ ΔΙΕΡΕΥΝΗΣΗ ΤΩΝ ΠΑΡΑΓΟΝΤΩΝ ΕΠΙΡΡΟΗΣ ΤΟΥ

Supplementary Materials for Evolutionary Multiobjective Optimization Based Multimodal Optimization: Fitness Landscape Approximation and Peak Detection

Quick algorithm f or computing core attribute

AΡΙΣΤΟΤΕΛΕΙΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΘΕΣΣΑΛΟΝΙΚΗΣ ΠΟΛΥΤΕΧΝΙΚΗ ΣΧΟΛΗ ΤΜΗΜΑ ΠΟΛΙΤΙΚΩΝ ΜΗΧΑΝΙΚΩΝ

Development of a Seismic Data Analysis System for a Short-term Training for Researchers from Developing Countries

Statistical Inference I Locally most powerful tests

Πτυχιακή Εργασι α «Εκτι μήσή τής ποιο τήτας εικο νων με τήν χρή σή τεχνήτων νευρωνικων δικτυ ων»

ΑΓΓΛΙΚΑ Ι. Ενότητα 7α: Impact of the Internet on Economic Education. Ζωή Κανταρίδου Τμήμα Εφαρμοσμένης Πληροφορικής

Homework 3 Solutions

Math 6 SL Probability Distributions Practice Test Mark Scheme

Area Location and Recognition of Video Text Based on Depth Learning Method

HMY 795: Αναγνώριση Προτύπων

ΖΩΝΟΠΟΙΗΣΗ ΤΗΣ ΚΑΤΟΛΙΣΘΗΤΙΚΗΣ ΕΠΙΚΙΝΔΥΝΟΤΗΤΑΣ ΣΤΟ ΟΡΟΣ ΠΗΛΙΟ ΜΕ ΤΗ ΣΥΜΒΟΛΗ ΔΕΔΟΜΕΝΩΝ ΣΥΜΒΟΛΟΜΕΤΡΙΑΣ ΜΟΝΙΜΩΝ ΣΚΕΔΑΣΤΩΝ


An Efficient Calculation of Set Expansion using Zero-Suppressed Binary Decision Diagrams

ΤΕΧΝΟΛΟΓΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΥΠΡΟΥ ΣΧΟΛΗ ΓΕΩΠΟΝΙΚΩΝ ΕΠΙΣΤΗΜΩΝ ΒΙΟΤΕΧΝΟΛΟΓΙΑΣ ΚΑΙ ΕΠΙΣΤΗΜΗΣ ΤΡΟΦΙΜΩΝ. Πτυχιακή εργασία

4.6 Autoregressive Moving Average Model ARMA(1,1)

Optimization, PSO) DE [1, 2, 3, 4] PSO [5, 6, 7, 8, 9, 10, 11] (P)

EPL 603 TOPICS IN SOFTWARE ENGINEERING. Lab 5: Component Adaptation Environment (COPE)

1 n-gram n-gram n-gram [11], [15] n-best [16] n-gram. n-gram. 1,a) Graham Neubig 1,b) Sakriani Sakti 1,c) 1,d) 1,e)

Appendix to On the stability of a compressible axisymmetric rotating flow in a pipe. By Z. Rusak & J. H. Lee

Sequent Calculi for the Modal µ-calculus over S5. Luca Alberucci, University of Berne. Logic Colloquium Berne, July 4th 2008

Probabilistic Approach to Robust Optimization

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

A Bonus-Malus System as a Markov Set-Chain. Małgorzata Niemiec Warsaw School of Economics Institute of Econometrics

Ι ΑΚΤΟΡΙΚΗ ΙΑΤΡΙΒΗ. Χρήστος Αθ. Χριστοδούλου. Επιβλέπων: Καθηγητής Ιωάννης Αθ. Σταθόπουλος

Matrices and Determinants

Prey-Taxis Holling-Tanner

Development of the Nursing Program for Rehabilitation of Woman Diagnosed with Breast Cancer

[1] DNA ATM [2] c 2013 Information Processing Society of Japan. Gait motion descriptors. Osaka University 2. Drexel University a)

Big Data/Business Intelligence

FORMULAS FOR STATISTICS 1

2 Composition. Invertible Mappings

Main source: "Discrete-time systems and computer control" by Α. ΣΚΟΔΡΑΣ ΨΗΦΙΑΚΟΣ ΕΛΕΓΧΟΣ ΔΙΑΛΕΞΗ 4 ΔΙΑΦΑΝΕΙΑ 1

Πτυχιακή Εργασία Η ΠΟΙΟΤΗΤΑ ΖΩΗΣ ΤΩΝ ΑΣΘΕΝΩΝ ΜΕ ΣΤΗΘΑΓΧΗ

ICT use and literature courses in secondary Education: possibilities and limitations

ST5224: Advanced Statistical Theory II

Reading Order Detection for Text Layout Excluded by Image

Feasible Regions Defined by Stability Constraints Based on the Argument Principle

GPU. CUDA GPU GeForce GTX 580 GPU 2.67GHz Intel Core 2 Duo CPU E7300 CUDA. Parallelizing the Number Partitioning Problem for GPUs

Simplex Crossover for Real-coded Genetic Algolithms

Commutative Monoids in Intuitionistic Fuzzy Sets

Η Διδακτική Ενότητα «Γνωρίζω τον Υπολογιστή», στα πλαίσια των Προγραμμάτων Σπουδών της Πληροφορικής: μια Μελέτη Περίπτωσης.

Estimation, Evaluation and Guarantee of the Reverberant Speech Recognition Performance based on Room Acoustic Parameters

Ανάκτηση Πληροφορίας

Durbin-Levinson recursive method

1 (forward modeling) 2 (data-driven modeling) e- Quest EnergyPlus DeST 1.1. {X t } ARMA. S.Sp. Pappas [4]

Transcript:

1,a) 2 1... DNA. Lasso. Bayesian Discriminant Feature Selection Tanaka Yusuke 1,a) Ueda Naonori 2 Tanaka Toshiyuki 1 Abstract: Focusing on categorical data, we propose a Bayesian feature selection method in which a set of class-specific features are selected for each class for improving the generalization ability of classification. In the proposed model, we introduce latent variable to each feature and each class to decide whether the feature is specific for the class or common in all classes. The latent variables are estimated from given training data by the framework of the Bayesian inference. Unlike the conventional feature selection methods, this enables us to obtain class dependent subset features which are effective for classification. We demonstrate that the proposed method can be superior to the conventional methods in terms of generalization ability through experiments with synthetic and real DNA data sets. We also show that the proposed method can obtain higher classification accuracy than Lasso and Support Vector Machine which indirectly realize feature selection. Keywords: Bayesian models, feature selection, classification 1. ( ) ( ) 1 Presently with Kyoto Uniersity 2 NTT Presently with NTT Comunication Science Laboratories a) ytanaka@sys.i.kyoto-u.ac.jp Guyon [1], 3.. (wrappers).,., 1

, (filters). [2] [3].., [1] (embedded methods). SVM-RFE [4]..,. bi-clustering [5] subset clustering [6] [7] ( ) subset clustering [6] [6] 2,. Lasso [8] 2. 2.1 1 1 (DNA ). 5 DNA.. Fig. 1 Examples of the categorical data (DNA data). Five DNA samples are represented respectively. Color part is the features related to the class. DNA (4 ) 1 x (k) j k j ( j ) 1,..., L j k (relevant) r k,j {1, 0} relevant (r k,j 1) x (k) j k (L ) (r k,j 0) x (k) j (L ) 2.2 k X (k) {x (k) i,j }(i 1,..., N k, j 1,..., M)., N k (k 1,..., K) k, M., x (k) i,j L {1,..., L}, x (k) i,j, 2,. φ {φ 1,..., φ L }, θ k,j {θ k,j,1,..., θ k,j,l }., φ l l, θ k, k j l, L l1 φ l 1, L l1 θ k, 1.,,, L j. 2

, φ, φ j {φ j,1,..., φ j,lj }.,, L, L, r k {0, 1} M., r k,j, λ, r k,j, x (k) i,j. λ. x (k) ij r k,j, φ, θ k,j r k,j λ Bernoulli(r k,j ; λ), { Multinomial(x (k) i,j ; θ k,j), r k,j 1 Multinomial(x (k) i,j ; φ), r k,j 0.,,,... λ a, b Beta(λ; a, b), φ α Dirichlet(φ; α), θ k,j β k,j Dirichlet(θ k,j ; β k,j )., a, b., α φ, β k,j θ k,j., X {X (1),..., X (K) }. p(x R, φ, Θ, λ) K N k M L ( ) (k) I(x θ r i,j l) k,j k, φ1 r k,j l k1 i1 j1 l1 K M L k1 j1 l1 ( θ r k,j k, φ1 r k,j l ) n (k), I(f) f 1, 0, n (k) N k i1 I(x(k) i,j l)., R {r k,j }(k 1,..., K, j 1,..., M), Θ {θ k,j }(k 1,..., K, j 1,..., M). 2.3, (1) ( ) p(r, φ, Θ, λ X) p(x R, φ, Θ)p(R λ),,,. p(x R) p(x R, φ, Θ)p(φ)p(Θ)dφdΘ ( K M 1 l Γ(I(r ) k,j 1)n (k) + β k, ) B(β k,j, ) k1 j1 Γ( l (I(r k,j 1)n (k) + β k, )) ( 1 l Γ( k j I(r ) k,j 0)n (k) + α l ) B(α ) Γ( l ( k j I(r. (2) k,j 0)n (k) + α l )), B( ), β k,j, L l1 β k,, α L l1 α l., Γ( )., p(r) K M p(r k,j λ)p(λ; a, b)dλ k1 j1 1 Γ( k B(a, b) j r k,j + a)γ( k j (1 r k,j) + b) Γ(KM + a + b) (3)., B(a, b). (2) (3) p(r X)., R,., r k,j R r \(k,j), r \(k,j) full conditional p(r k,j r \(k,j), X). k j,. r k,j 1 r k,j 0 γ, (2) (3). γ p(r k,j 1 r \(k,j), X) p(r k,j 0 r \(k,j), X) ( L ) ( Γ(n (k) + β k, ) Γ(β k, ) l1 ( L Γ( L l1 β k,) Γ( L l1 (n(k) ) + β k, )) ) Γ( (s,t) (k,j) I(r s,t 0)n (s) t,l + α l) l1 Γ(n (k) + (s,t) (k,j) I(r s,t 0)n (s) t,l + α l) ( L Γ( l1 (n(k) + (s,t) (k,j) I(r s,t 0)n (s) t,l + α ) l)) Γ( L l1 ( (s,t) (k,j) I(r s,t 0)n (s) t,l + α l)) ( (s,t) (k,j) I(r ) s,t 1) + a (s,t) (k,j) I(r. (4) s,t 0) + b, p(r k,j 1 r \(k,j), X) + p(r k,j 0 r \(k,j), X) 1, (4). p(r k,j 1 r \(k,j), X) γ 1 + γ, (5) p(r k,j 0 r \(k,j),x ) 1 1 + γ. (6) 2.4,, 3

( ). x (x 1,..., x M ), X,. k arg max k p(c k x, X) arg max p(x C k, X)p(C k ). k (7), p(c k ) p(x C k, X), p(x C k, X) M j1 M j1 r k,j p(x j r k,j )p(r k,j X) (8) 1 T t 0 T tt 0+1 k,j ) (9) (8), (8) p(r k,j X)., (9) r (t) k,j t. T, t 1,..., t 0 r k,j,. (9) k,j ), r(t) k,j, k,j 1) p(x j θ k,j )p(θ k,j X)dθ k,j k,j 0) L l1 (n(k) + β k, ) I(xjl) L, (10) l1 (n(k) + β k, ) p(x j φ)p(φ X)dφ L l1 (n( ) + α l) I(xjl) L l1 (n( ) + α. (11) l), n ( ) K k1 n(k).,, (10) k k j, l., (11) j l,. 3. 3.1 r k,j 1 relevant (NB) NB., [1], L 1, 3.1.1 (NB) NB ( ). x (k) i,j θ k,j Multinomial(x (k) i,j ; θ k,j), θ k,j β k,j Dirichlet(θ k,j ; β k,j ).,., θ k,j (MAP)., 5. 3.1.2 (Backward Selection BS). BS,, 1. 5. 3.1.3 L 1,., L 1 [8]. Lasso.,. p(c k x) exp(a k ) K k 1 exp(a (12) k )., a k ln(p(x C k )p(c k )),, a k a k w T k x + w k,0 x.,, T, L 1. W arg max W p(t W) η K k1 j1 M w k,j. (13), η > 0. η 5. 3.1.4 (SVM) SVM, 2. SVM, [10].. RBF,, 5. 4

3.2,. DNA. 10, 2/3, 1/3. 3.2.1,,, L 1. α,. β k,j,., a, b, b 0. Table 1 1 Value of the parameters to sample synthetic data L k,j a, b data1 5 50 50 5 10 0.2 1, 8 data2 5 50 50 10 10 0.3 1, 6 data3 10 50 100 5 10 0.2 1, 6 data4 10 100 50 10 10 0.2 1, 8 data5 30 50 50 10 10 0.2 1, 6 3.2.2, Promoter Splice 2. DNA { A,G,C,T } 4,., DNA. DNA, [11], KBANN. Promoter, 57 2. (Promoter), DNA RNA,.,,. 53. Splice, 60 3., DNA RNA., RNA,. (Splicing). Splice,. - 1, - 2, 3., 1 767, 2 768, 3 1655. 4. 4.1, 1. 10. 5., BDFS, NB, BS (Backward Selection BS). 4.1.1 ( ) 2. BDFS NB, BDFS,., data2 data5 NB 15%. data2 NB,,,, BDFS 96.4%, NB 99.3%. data5 30,.,,. 2 Table 2 ( )(%) Accuracy of using each feature selection method (synthetic data) BDFS NB BS data1 96.6 1.8 88.2 4.1 92.1 1.8 data2 96.0 1.1 80.8 4.1 80.7 3.0 data3 92.3 0.3 90.8 1.0 91.0 1.0 data4 93.6 0.6 86.0 1.2 85.2 0.9 data5 87.7 0.1 61.6 1.6 62.2 1.6 4.1.2 ( ) 3. L1 LR+Lasso, SVM SVM(L), RBF SVM SVM(R).,. LR+Lasso,,,. 5

3 ( )(%) Table 3 Accuracy of each classifier (synthetic data) BDFS LR+Lasso SVM(L) SVM(R) data1 96.6 1.8 90.5 2.8 79.4 1.9 87.4 2.3 data2 96.0 1.1 78.6 4.6 71.0 3.0 79.9 3.6 data3 92.3 0.3 93.1 3.4 82.3 2.2 87.0 1.9 data4 93.6 0.6 91.0 2.9 84.5 3.1 90.2 2.3 data5 87.7 0.1 74.4 2.5 65.1 1.9 69.1 0.9 4.2, 3.2.2 DNA. 4.2.1 (DNA ) 4 DNA. Promoter, BDFS NB BS. BDFS 98.4%, NB 99.6%,., BS, BDFS. Splice,,.,,., Splice, BDFS 95.8%, NB 96.1%. 4 (DNA ) (%) Table 4 Test accurary of using each feature selection method (DNA data) BDFS NB BS Promoter 91.6 5.2 88.8 3.5 89.4 4.5 Splice 95.7 1.4 94.7 0.7 95.1 0.7 4.2.2 DNA, 5., LR+Lasso SVM., BDFS. 5 ( ) Table 5 Accuracy of using each classifier (DNA data) 5.,.,,.,,., L 1,.,,,. [1] I. Guyon and A. Elisseeff, An Introduction to Variable and Feature Selection, Journal of Machine Learning Research, vol. 3, pp. 1157-1182, 2003. [2] M. A. Hall, Correlation-based Feature Selection for Machine Learning, PhD Thesis, Department of Computer Science, Waikato University, New Zealand, 1999. [3] H. Peng, F. Long and C. Ding, Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1226-1238, 2005. [4] I. Guyon, J. Weston, S. Barnhill and V. Vapnik, Gene Selection for Cancer Classification using Support Vector Machine, vol. 46, no. 1-3, pp. 389-422, 2002. [5] H. Shan and A. Banerjee, Bayesian Co-clustering, Proc. IEEE International Conference on Data Mining (ICDM), pp. 530-539, 2008. [6] P. D. Hoff, Subset Clustering of Binary Sequences, with an Application to Genomic Abnormality Data, Biometrics, vol. 61, no. 4, pp. 1027-1036, 2005. [7] K. Ishiguro, N. Ueda and H. Sawada, Subset Infinite Relational Models, Proc. 15th International Conference on Artificial Intelligence and Statistics (AISTATS), 2012, to appear [8] R. Tibshirani, Regression Shrinkage and Selection via the Lasso, Journal of Royal Statistical Society, vol. 58, no. 1, pp. 267-288, 1996. [9] C. W. Hsu, C. C. Chung and C. J. Lin, A Practical Guide to Support Vector Classification, Technical Report, Department of Computer Science, National Taiwan University, Taipei, 2003. [10] V. N. Vapnik, Statistic Learning Theory, Wiley, New York, 1998. [11] G. G. Towell and J. W. Shavlik, Knowledge-Based Artificial Neural Networks, Artificial Intelligence, vol. 70, no. 1-2, pp. 119-165, 1994. BDFS LR+Lasso SVM(L) SVM(R) Promoter 91.1 5.2 90.8 5.1 79.6 4.3 86.4 5.2 Splice 95.6 1.4 95.8 0.6 88.3 3.5 94.0 5.2 6