Gaussian Processes Classification Combined with Semi-supervised Kernels

Σχετικά έγγραφα
ER-Tree (Extended R*-Tree)

: Active Learning 2017/11/12

Nov Journal of Zhengzhou University Engineering Science Vol. 36 No FCM. A doi /j. issn

High order interpolation function for surface contact problem

Buried Markov Model Pairwise

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

A summation formula ramified with hypergeometric function and involving recurrence relation

Web-based supplementary materials for Bayesian Quantile Regression for Ordinal Longitudinal Data

Quick algorithm f or computing core attribute

Area Location and Recognition of Video Text Based on Depth Learning Method

Probabilistic Approach to Robust Optimization

HMY 795: Αναγνώριση Προτύπων

I. Μητρώο Εξωτερικών Μελών της ημεδαπής για το γνωστικό αντικείμενο «Μη Γραμμικές Ελλειπτικές Διαφορικές Εξισώσεις»

Newman Modularity Newman [4], [5] Newman Q Q Q greedy algorithm[6] Newman Newman Q 1 Tabu Search[7] Newman Newman Newman Q Newman 1 2 Newman 3

Ανάκτηση Εικόνας βάσει Υφής με χρήση Eye Tracker

No. 7 Modular Machine Tool & Automatic Manufacturing Technique. Jul TH166 TG659 A

Adaptive grouping difference variation wolf pack algorithm

, Litrrow. Maxwell. Helmholtz Fredholm, . 40 Maystre [4 ], Goray [5 ], Kleemann [6 ] PACC: 4210, 4110H

Optimization, PSO) DE [1, 2, 3, 4] PSO [5, 6, 7, 8, 9, 10, 11] (P)

Detection and Recognition of Traffic Signal Using Machine Learning

( ) , ) , ; kg 1) 80 % kg. Vol. 28,No. 1 Jan.,2006 RESOURCES SCIENCE : (2006) ,2 ,,,, ; ;

Congruence Classes of Invertible Matrices of Order 3 over F 2

1530 ( ) 2014,54(12),, E (, 1, X ) [4],,, α, T α, β,, T β, c, P(T β 1 T α,α, β,c) 1 1,,X X F, X E F X E X F X F E X E 1 [1-2] , 2 : X X 1 X 2 ;

Estimation of stability region for a class of switched linear systems with multiple equilibrium points

{takasu, Conditional Random Field

Research on Economics and Management


SVM. Research on ERPs feature extraction and classification

Correction of chromatic aberration for human eyes with diffractive-refractive hybrid elements

Stabilization of stock price prediction by cross entropy optimization

ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ ΣΧΟΛΗ ΗΛΕΚΤΡΟΛΟΓΩΝ ΜΗΧΑΝΙΚΩΝ ΚΑΙ ΜΗΧΑΝΙΚΩΝ ΥΠΟΛΟΓΙΣΤΩΝ

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

Jordan Form of a Square Matrix

Schedulability Analysis Algorithm for Timing Constraint Workflow Models

Vol. 31,No JOURNAL OF CHINA UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb

Anomaly Detection with Neighborhood Preservation Principle

Motion analysis and simulation of a stratospheric airship

A research on the influence of dummy activity on float in an AOA network and its amendments

(, ) (SEM) [4] ,,,, , Legendre. [6] Gauss-Lobatto-Legendre (GLL) Legendre. Dubiner ,,,, (TSEM) Vol. 34 No. 4 Dec. 2017

Single-value extension property for anti-diagonal operator matrices and their square

Yoshifumi Moriyama 1,a) Ichiro Iimura 2,b) Tomotsugu Ohno 1,c) Shigeru Nakayama 3,d)

EM Baum-Welch. Step by Step the Baum-Welch Algorithm and its Application 2. HMM Baum-Welch. Baum-Welch. Baum-Welch Baum-Welch.

ACTA MATHEMATICAE APPLICATAE SINICA Nov., ( µ ) ( (

J. of Math. (PRC) Banach, , X = N(T ) R(T + ), Y = R(T ) N(T + ). Vol. 37 ( 2017 ) No. 5

Research on model of early2warning of enterprise crisis based on entropy

Study of In-vehicle Sound Field Creation by Simultaneous Equation Method

Supporting Information. Generation Response. Physics & Chemistry of CAS, 40-1 South Beijing Road, Urumqi , China. China , USA

Applying Markov Decision Processes to Role-playing Game

Επιβλεπόμενη Μηχανική Εκμάθηση ΙI

CHAPTER 48 APPLICATIONS OF MATRICES AND DETERMINANTS

Research on real-time inverse kinematics algorithms for 6R robots

Reading Order Detection for Text Layout Excluded by Image

CorV CVAC. CorV TU317. 1

Optimization Investment of Football Lottery Game Online Combinatorial Optimization

. (1) 2c Bahri- Bahri-Coron u = u 4/(N 2) u

Bayesian Discriminant Feature Selection

Επεξεργασία πειραματικών αποτελεσμάτων

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

Feasible Regions Defined by Stability Constraints Based on the Argument Principle

Πτυχιακή Εργασι α «Εκτι μήσή τής ποιο τήτας εικο νων με τήν χρή σή τεχνήτων νευρωνικων δικτυ ων»

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

: Volterra. Volterra 3. (i.i.d.) Volterra cross-correlation. 2.1 Volterra f[ ] y(t) = f[t, u( )] y(t) = h 0 +

Prey-Taxis Holling-Tanner

Study of urban housing development projects: The general planning of Alexandria City

VSC STEADY2STATE MOD EL AND ITS NONL INEAR CONTROL OF VSC2HVDC SYSTEM VSC (1. , ; 2. , )

ΕΥΡΕΣΗ ΤΟΥ ΔΙΑΝΥΣΜΑΤΟΣ ΘΕΣΗΣ ΚΙΝΟΥΜΕΝΟΥ ΡΟΜΠΟΤ ΜΕ ΜΟΝΟΦΘΑΛΜΟ ΣΥΣΤΗΜΑ ΟΡΑΣΗΣ

Studies on the Binding Mechanism of Several Antibiotics and Human Serum Albumin

CSJ. Speaker clustering based on non-negative matrix factorization using i-vector-based speaker similarity

Approximation of distance between locations on earth given by latitude and longitude

Main source: "Discrete-time systems and computer control" by Α. ΣΚΟΔΡΑΣ ΨΗΦΙΑΚΟΣ ΕΛΕΓΧΟΣ ΔΙΑΛΕΞΗ 4 ΔΙΑΦΑΝΕΙΑ 1

HOSVD. Higher Order Data Classification Method with Autocorrelation Matrix Correcting on HOSVD. Junichi MORIGAKI and Kaoru KATAYAMA

ΓΕΩΠΟΝΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙO ΑΘΗΝΩΝ ΤΜΗΜΑ ΑΞΙΟΠΟΙΗΣΗΣ ΦΥΣΙΚΩΝ ΠΟΡΩΝ & ΓΕΩΡΓΙΚΗΣ ΜΗΧΑΝΙΚΗΣ

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

substructure similarity search using features in graph databases

Development of the Nursing Program for Rehabilitation of Woman Diagnosed with Breast Cancer

476,,. : 4. 7, MML. 4 6,.,. : ; Wishart ; MML Wishart ; CEM 2 ; ;,. 2. EM 2.1 Y = Y 1,, Y d T d, y = y 1,, y d T Y. k : p(y θ) = k α m p(y θ m ), (2.1

Automatic extraction of bibliography with machine learning

(Υπογραϕή) (Υπογραϕή) (Υπογραϕή)

Ενεργητική Μάθηση Με Χρήση Μηχανών ιανυσµάτων Υποστήριξης. Ανδρέας Βλάχος Πανεπιστήµιο του Εδιµβούργου

Reminders: linear functions

New bounds for spherical two-distance sets and equiangular lines


Homomorphism in Intuitionistic Fuzzy Automata

Local Approximation with Kernels

1. Ηλεκτρικό μαύρο κουτί: Αισθητήρας μετατόπισης με βάση τη χωρητικότητα

Control Theory & Applications PID (, )

On the Galois Group of Linear Difference-Differential Equations

Tutorial on Multinomial Logistic Regression

Other Test Constructions: Likelihood Ratio & Bayes Tests

HMY 795: Αναγνώριση Προτύπων

[4] 1.2 [5] Bayesian Approach min-max min-max [6] UCB(Upper Confidence Bound ) UCT [7] [1] ( ) Amazons[8] Lines of Action(LOA)[4] Winands [4] 1

1 (forward modeling) 2 (data-driven modeling) e- Quest EnergyPlus DeST 1.1. {X t } ARMA. S.Sp. Pappas [4]

Second Order RLC Filters

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review

2002 Journal of Software

Optimization Investment of Football Lottery Game Online Combinatorial Optimization

:,,,, ,,, ;,,,,,, ,, (Barro,1990), (Barro and Sala2I2Martin,1992), (Arrow and Kurz,1970),, ( Glomm and Ravikumar,1994), (Solow,1957)

SCITECH Volume 13, Issue 2 RESEARCH ORGANISATION Published online: March 29, 2018

2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems

Transcript:

35 7 Vol. 35, No. 7 2009 7 ACTA AUTOMATICA SINICA July, 2009 1 1 1 2. : 1) ; 2) ; 3),. :,.,.,,, TP391 Gaussian Processes Classification Combined with Semi-supervised Kernels LI Hong-Wei 1 LIU Yang 1 LU Han-Qing 1 FANG Yi-Kai 2 Abstract In this paper, we present a semi-supervised algorithm to learn Gaussian process classifiers, which is combined with nonparametric semi-supervised kernels in the presence of unlabeled data. This algorithm mainly includes the following aspects: 1) The spectral decomposition of graph Laplacians is used to obtain kernel matrices incorporating labeled and unlabeled data; 2) The convex optimization method is employed to learn the optimal weights of kernel matrix eigenvectors, which construct the nonparametric semi-supervised kernels; 3) The proposed semi-supervised learning algorithm is obtained by incorporating semi-supervised kernels into Gaussian process model. The main characteristic of the proposed algorithm is that we employ the nonparametric semi-supervised kernels based on the entire dataset into the Gaussian process model, which has an explicit probabilistic interpretation, and can model the uncertainty among the data and solve the complex non-linear inference problems. The effectiveness of the proposed algorithm is demonstrated by the experimental results in comparison with other related works in the literature. Key words Gaussian processes, semi-supervised learning, kernel method, convex optimization, [1]. :,.,,,.,., 2008-06-16 2008-11-16 Received June 16, 2008; in revised form November 16, 2008 (863 ) (2006AA01Z315), (60833006), (60605004) Supported by National High Technology Research and Development Program of China (863 Program) (2006AA01Z315), Key Project of National Natural Science Foundation of China (60833006), and National Natural Science Foundation of China (60605004) 1. 100190 2. 100176 1. National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190 2. Nokia Research Center, China, Beijing 100176 DOI: 10.3724/SP.J.1004.2009.00888,,.,,,,,., [2 9],,,.,,,,.,,,.,.

7 : 889,., : 1) (Support vactor machine, SVM) Margin,. [1], (Null-category noise model, NCNM),, NCNM Margin, (Transductive support vector machine, TSVM) [1,10],. Rogers [4] (Multinomial probit) NCNM, Lawrence ; 2),,,. [11 14], ( ),,, (Spectral clustering) [12], (Diffusion kernels) [13], (Gaussian random field) [14],, ( ). Sindhwani [15], (Reproducing kernel Hilbert spaces, RKHS),,,,, [16].,,,,,.,, (Kernel alignment), RKHS,,,. 1 1.1 [2],, µ Σ f = (f 1,, f n ) N(µ, Σ) (1), f i. f(x) m(x) ( ) K(x, x ) f(x) GP (m(x), K(x, x )) (2), x i, : m(x) = E[f(x)], : K(x, x ) = E[(f(x) m(x))(f(x ) m(x ))].,, f(x),,.,,. 1.2. X = {X l, X u }, X l = {x 1,, x l }, Y l = {y 1,, y l }, X u = {x l+1,, x l+u }. f(x) : f(x, θ) GP (0, K),, K R (), θ K. f, X Y. p(y = +1 f(x)) = Φ(f(x)) (3), Φ S (Sigmoid), (Logistic) (Cumulative Gaussian). f,, l+u l+u p(y f) = p(y i f i ) = Φ(y i f i ) (4) p(f) p(y f),

890 35 p(f y, X, θ) = p(y f)p(f X, θ) p(y X, θ) (5) x t, f(x t ) p(f t y, X, θ, x t ) = p(f t f, X, θ, x t )p(f y, X, θ)df (6) x t p(y t y, X, θ, x t ) = p(y t f t )p(f t y, X, θ, x t )df t (7), : f(x), ( Sigmoid ),. 1.3, 1.,,. Fig. 1 1 The graphical representation of Gaussian processes in the discriminative framework 1, x i X l, f i, x i K y i, y i, x i K,,. x j X u, y j, x j K, x j f,.,,. 2 2.1 RKHS X = {X l, X u },, { } 1 l f = arg min C(f, x i, y i ) + Ω( f H ) (8) l, C, f ; f H f, f. (Representer theorem) [17], l f (x) = α i K(x, x i ) (9), f {x i }.,,., {x i, y i } l {x i, y i } l+u i=l+1, f = arg min f H { 1 l + u } C(f, x i, y i ) + Ω( f H) (10), H,. Sindhwani [15] H ( ), H. V, S, H H : f, g H = f, g H + Sf, Sg V. H, f H H, H, H K(x i, y j ) [15] K(x i, y j )=K(x i, y j ) K T x (I +MK) 1 MK x (11), K x = [K(x 1, x),, K(x l+u, x)] T, M, K(x i, x j ), RKHS,,

7 : 891,,.,,,. RKHS. 2.2 RKHS :, (),. X = {X l, X u } G = {V, E}, V, E, W = {w ij }, x i x j, w ij = 0. D ii = j W ij W, : L = D W, L L = λ i φ i φ T i (12), λ i L, λ 1 λ l+u, φ i L. λ i : λ i = r(λ i ), K = r(λ i )φ i φ T i, r(λ i ) 0 (13) K : f 2 K = f T K 1 f., f,,.,,,,. r(λ), r(λ) λ, φ i ( λ i ) K. r(λ),. r(λ),, RKHS, K = α i φ i φ T i, α i 0 (14), α i : α i α i+1, α i, f, α i. K, RKHS K(x i, y j )=K (x i, y j ) K T x (I +LK) 1 LK x (15) 2.3 [18 19],. :,,. ( ),. A(K, T ) = K, T F K, K F T, T F (16) K K, ; T,, T ij = y i y j,, y i = y j, T ij = 1, T ij = 1.,,, : 1),, ; 2),, ; 3) :,. 2.4 (16),. : α i α i+1,,,., s. t. max K [A(K, T )] (17) K = α i K i (18) tr(k ) = 1 (19)

892 35 α i 0 (20) α i α i+1, i = 1,, l + u (21), K i = φ i φ T i, φ i. (19), (20) K. (17), A(K, T ) K, T F, T T ij +1 1, (18) α i,,. K, (15) RKHS K,. 3 3.1,,, 2., X, Y, f, K RKHS. p(y t y, X, K, x t ) = p(y t f t )p(f t y, X, K, x t )df t (23) (4) p(y f) Sigmoid, p(f 0, K), (22) p(f X, K) (23) p(y t X, K, x t ). : p(f X, K) q(f X, K) = N(f m, Σ),. (6), f x t q(f y, X, K, x t ) = N(f t µ t, σt 2 ), µ t = k T t K 1 m (24) σ 2 t = k(x t, x t ) k T t ( K 1 K 1 Σ K 1 )k t (25), k t x t X. Probit, x t q(y t =+1 y, X, K, x t )= Φ(f t )N(f t µ t, σ 2 t )df t = ( ) µ t Φ (26) 1 + σ 2 t Fig. 2 2 The graphical structure of Gaussian processes combined with semi-supervised kernels : 1), L; 2) L, φ i ; 3) K, (15) RKHS K; 4) K (7), f GP (0, K) K,. (4), (5) (7) p(f X, K) N(f 0, K) l+u = p(y X, K) Φ(y i f i ) (22), q(f X, K) = N(f m, Σ) m Σ (Expectation propagation, EP) [20]. EP f, (Kullback-Leibler divergence) [21]., EP. 3.2 : 1) : K, x t, O(N 2 ), N ; 2) :, N N, O(N 3 ), EP ( θ), K (Cholesky), N 3 /6;,

7 : 893 α i, α i 0,,, O(N 2 ), O(N 3 ),., [22], O(N log N),., ( ),,, ( ), ;. 4 4.1,. [11], 1. One vs Two Odd vs Even Cedar Buffalo,. One vs Two 1 2, 2 200. Odd vs Even 0, 2, 4, 6, 8 1, 3, 5, 7, 9, 4 000. Pc vs Mac Baseball vs Hockey 20,. Pc vs Mac 1 943, Baseball vs Hockey 1 993., One vs Two Odd vs Even, (Euclidean 10-nearest-neighbor, 10NN) ; Pc vs Mac Baseball vs Hockey, (Cosine similarity 10-nearest-neighbor, 10NN). 1 Table 1 The information of database on Experiment 1 One vs Two 2 200 2 Euclidean 10 NN Odd vs Even 4 000 2 Euclidean 10 NN Pc vs Mac 1 943 2 Cosine similarity 10 NN Baseball vs Hockey 1 993 2 Cosine similarity 10 NN [15], 2. G50c, ; Coil20 20 32 32, ; Uspst USPS,. 2 Table 2 The information of database on Experiment 2 G50c 2 550 50 50 Coil20 20 1 140 40 1 024 Uspst 10 2 007 50 256 4.2, 5,., 20,,.,, (), RBF (Radial basis function),, RBF ( K(x, x ) = θ 1 exp (x ) x ) T (x x ) (27) 2θ2 2, θ 1 θ 2 RBF, q(y X, θ) q(y X, θ) = p(y f)q(f X, θ)df (28) q(y X, θ), RBF θ i, θ i (6) (7) RBF. 3 6 ( ), Zhu [11] ( ), ( ) ( )., SVM.,,. ( ).,.

894 35 3 One vs Two (%) Table 3 The average accuracies on the One vs Two datasets (%) (RBF) [11] [11] 10 96.1 96.2 67.2 65.3 85.1 78.7 20 97.3 96.4 81.4 85.6 91.6 90.4 30 98.3 98.2 92.5 93.4 94.9 93.6 40 98.7 98.3 95.6 94.7 95.6 94.0 50 98.9 98.4 96.9 95.9 97.1 96.1 4 Odd vs Even (%) Table 4 The average accuracies on the Odd vs Even datasets (%) (RBF) [11] [11] 10 69.7 69.6 62.7 60.1 69.3 65.0 30 83.7 82.4 80.4 76.3 79.6 77.7 50 88.2 87.6 85.2 82.6 83.6 81.8 70 90.1 89.2 87.3 85.3 86.4 84.4 90 92.6 91.5 90.1 89.5 87.2 86.1 5 Pc vs Mac (%) Table 5 The average accuracies on the Pc vs Mac datasets (%) (RBF) [11] [11] 10 87.8 87.0 55.2 53.4 54.9 51.6 30 90.9 90.3 75.7 72.6 65.4 62.6 50 92.2 91.3 79.3 77.2 70.2 67.8 70 93.4 91.5 83.5 80.7 76.8 74.7 90 93.6 91.5 85.8 83.2 81.2 79.0 6 Baseball vs Hockey (%) Table 6 The average accuracies on the Baseball vs Hockey datasets (%) (RBF) [11] [11] 10 96.2 95.7 61.2 59.4 60.2 53.6 30 98.2 98.0 69.9 68.7 72.6 69.3 50 98.6 97.9 77.5 79.6 80.4 77.7 70 98.8 97.9 84.1 85.3 85.5 83.9 90 99.1 98.0 90.4 92.6 89.7 88.5 Table 7 7 (%) The average accuracies of the results on Experiment 2 (%) SVM [15] LapSVM [15] G50c 94.6 88.9 85.2 90.3 95.0 Coil20 87.8 72.1 73.5 77.4 85.4 Uspst 84.2 74.2 71.6 77.9 82.3 7,, RKHS, SVM (Laplacian SVM, LapSVM), SVM LapSVM 4, 4,, ;, RKHS,,., ; RKHS, G50c [15], [15], RKHS. 5.,,,,.,,

7 : 895 ;,,,,. References 1 Chapelle O, Scholkopf B, Zien A. Semi-Supervised Learning. Cambridge: MIT Press, 2006 2 Rasmussen C E, Christopher K I W. Gaussian Processes for Machine Learning. Cambridge: MIT Press, 2006 3 Rasmussen C E. Advances in Gaussian processes. In: Proceedings of the Neural Information Processing Systems Conference. Cambridge, USA: MIT Press, 2006. 1 55 4 Rogers S, Girolami M. Multi-class semi-supervised learning with the E-truncated multinomial probit Gaussian process. In: Proceedings of the Gaussian Processes in Practice Workshop. Berlin, German: Springer, 2007. 17 32 5 Lawrence N D. Probabilistic non-linear principal component analysis with Gaussian process latent variable models. The Journal of Machine Learning Research, 2005, 6: 1783 1816 6 Lawrence N D, Candela J Q. Local distance preservation in the GP-LVM through back constraints. In: Proceedings of the 23rd International Conference in Machine Learning. Pittsburgh, Pennsylvania: ACM, 2006. 513 520 7 Lawrence N D. Learning for larger datasets with the Gaussian process latent variable model. In: Proceedings of the 11th International Workshop on Artificial Intelligence and Statistics. San Juan, Puerto Rico: Morgan Kaufmann Publishers, 2007. 1 8 8 Lawrence N D, Moore A J. Hierarchical Gaussian process latent variable models. In: Proceedings of the 24th International Conference in Machine Learning. Corvalis, Oregon: ACM, 2007. 481 488 9 Urtasun R, Darrell T. Discriminative Gaussian process latent variable model for classification. In: Proceedings of the 24th International Conference in Machine Learning. Corvalis, Oregon: ACM, 2007. 927 934 10 Joachims T. Transductive inference for text classification using support vector machines. In: Proceedings of the 16th International Conference on Machine Learning. San Francisco, USA: Morgan Kaufmann Publishers, 1999. 200 209 11 Zhu X J, Kandola J, Ghahramani Z, Laffert J. Nonparametric transforms of graph kernels for semi-supervised learning. In: Proceedings of the Conference on Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2005. 1641 1648 12 Chapelle O, Weston J, Scholkopf B. Cluster kernels for semisupervised learning. In: Proceedings of the Conference on Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2002. 585 592 13 Kondor R I, Lafferty J D. Diffusion kernels on graphs and other discrete input spaces. In: Proceedings of the 19th International Conference on Machine Learning. San Francisco, USA: Morgan Kaufmann Publishers, 2002. 315 322 14 Zhu X J, Ghahramani Z, Lafferty J D. Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the 20th International Conference on Machine Learning. Washington D. C., USA: MIT Press, 2003. 912 919 15 Sindhwani V, Niyogi P, Belkin M. Beyond the point cloud: from transductive to semi-supervised learning. In: Proceedings of the 22nd International Conference on Machine Learning. Bonn, Germany: MIT Press, 2005. 824 831 16 Sindhwani V, Chu W, Keerthi S S. Semi-supervised Gaussian processes classifiers. In: Proceedings of the International Joint Conference on Artificial Intelligence. San Francisco, USA: Morgan Kaufmann Publishers, 2007. 1059 1064 17 Kimeldorf G S, Wahba G. Some results on Tchebycheffian spline functions. Journal of Mathematical Analysis and Applications, 1971, 33(1): 82 95 18 Cristianini N, Shawe-Taylor J, Elisseeff A, Kandola J. On kernel-target alignment. In: Proceedings of the Conference on Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2002. 367 373 19 Lanckriet G R G, Cristianini N, Bartlett P, Ghaoui L E, Jordan M I. Learning the kernel matrix with semidefinite programming. The Journal of Machine Learning Research, 2004, 5: 27 72 20 Minka T P. A Family of Algorithms for Approximate Bayesian Inference [Ph. D. dissertation], Massachusetts Institute of Technology, USA, 2001 21 Kullback S. Information Theory and Statistics. New York: Dover, 1959 22 Smith B, Bjrstad P, Gropp W. Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations. Cambridge: Cambridge University Press, 1996.,.. E-mail: hwli@nlpr.ia.ac.cn (LI Hong-Wei Ph. D. candidate at the Institute of Automation, Chinese Academy of Sciences. His research interest covers image and video processing, and machine learning. Corresponding author of this paper.).,. E-mail: liuyang@nlpr.ia.ac.cn (LIU Yang Ph. D. candidate at the Institute of Automation, Chinese Academy of Sciences. His research interest covers image and video processing, and machine learning.).,. E-mail: luhq@nlpr.ia.ac.cn (LU Han-Qing Professor at the Institute of Automation, Chinese Academy of Sciences. His research interest covers image and video processing, and multimedia technique.). 2008.. E-mail: ykfang@gmail.com (FANG Yi-Kai Researcher at Nokia Research Center, China. He received his Ph. D. degree at the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Science in 2008. His research interests covers image processing and pattern recognition.)