Supplemental Material: Scaling Up Sparse Support Vector Machines by Simultaneous Feature and Sample Reduction

Σχετικά έγγραφα
Last Lecture. Biostatistics Statistical Inference Lecture 19 Likelihood Ratio Test. Example of Hypothesis Testing.

1. For each of the following power series, find the interval of convergence and the radius of convergence:

Homework for 1/27 Due 2/5

Homework 4.1 Solutions Math 5110/6830

On Certain Subclass of λ-bazilevič Functions of Type α + iµ


n r f ( n-r ) () x g () r () x (1.1) = Σ g() x = Σ n f < -n+ r> g () r -n + r dx r dx n + ( -n,m) dx -n n+1 1 -n -1 + ( -n,n+1)

Ψηφιακή Επεξεργασία Εικόνας

α β

Supplementary Materials: Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent

A study on generalized absolute summability factors for a triangular matrix

IIT JEE (2013) (Trigonomtery 1) Solutions

C.S. 430 Assignment 6, Sample Solutions

On Generating Relations of Some Triple. Hypergeometric Functions

Other Test Constructions: Likelihood Ratio & Bayes Tests

Bessel function for complex variable

Biorthogonal Wavelets and Filter Banks via PFFS. Multiresolution Analysis (MRA) subspaces V j, and wavelet subspaces W j. f X n f, τ n φ τ n φ.

CHAPTER 103 EVEN AND ODD FUNCTIONS AND HALF-RANGE FOURIER SERIES

Introduction of Numerical Analysis #03 TAGAMI, Daisuke (IMI, Kyushu University)

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

Math221: HW# 1 solutions

Every set of first-order formulas is equivalent to an independent set

ST5224: Advanced Statistical Theory II

Reminders: linear functions

The Heisenberg Uncertainty Principle

2 Composition. Invertible Mappings

Fractional Colorings and Zykov Products of graphs

derivation of the Laplacian from rectangular to spherical coordinates

Finite Field Problems: Solutions

Lecture 17: Minimum Variance Unbiased (MVUB) Estimators

EE512: Error Control Coding

Statistical Inference I Locally most powerful tests

Solutions to Exercise Sheet 5

Jesse Maassen and Mark Lundstrom Purdue University November 25, 2013

FREE VIBRATION OF A SINGLE-DEGREE-OF-FREEDOM SYSTEM Revision B

Supplementary Material For Testing Homogeneity of. High-dimensional Covariance Matrices

Matrices and Determinants

Solution Series 9. i=1 x i and i=1 x i.

1. Matrix Algebra and Linear Economic Models

b. Use the parametrization from (a) to compute the area of S a as S a ds. Be sure to substitute for ds!

A Note on Intuitionistic Fuzzy. Equivalence Relation

Example Sheet 3 Solutions

On Inclusion Relation of Absolute Summability

MATH 38061/MATH48061/MATH68061: MULTIVARIATE STATISTICS Solutions to Problems on Matrix Algebra

Problem Set 3: Solutions

Μηχανική Μάθηση Hypothesis Testing

The Simply Typed Lambda Calculus

Degenerate Perturbation Theory

Main source: "Discrete-time systems and computer control" by Α. ΣΚΟΔΡΑΣ ΨΗΦΙΑΚΟΣ ΕΛΕΓΧΟΣ ΔΙΑΛΕΞΗ 4 ΔΙΑΦΑΝΕΙΑ 1

Homework 8 Model Solution Section

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

Srednicki Chapter 55

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

w o = R 1 p. (1) R = p =. = 1

Solve the difference equation

Στα επόμενα θεωρούμε ότι όλα συμβαίνουν σε ένα χώρο πιθανότητας ( Ω,,P) Modes of convergence: Οι τρόποι σύγκλισης μιας ακολουθίας τ.μ.

Congruence Classes of Invertible Matrices of Order 3 over F 2

A Two-Sided Laplace Inversion Algorithm with Computable Error Bounds and Its Applications in Financial Engineering

Inertial Navigation Mechanization and Error Equations

LAD Estimation for Time Series Models With Finite and Infinite Variance

Written Examination. Antennas and Propagation (AA ) April 26, 2017.

Solutions: Homework 3

ΕΛΛΗΝΙΚΗ ΔΗΜΟΚΡΑΤΙΑ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΡΗΤΗΣ. Ψηφιακή Οικονομία. Διάλεξη 7η: Consumer Behavior Mαρίνα Μπιτσάκη Τμήμα Επιστήμης Υπολογιστών

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 6/5/2006

MINIMAL CLOSED SETS AND MAXIMAL CLOSED SETS

Problem Set 9 Solutions. θ + 1. θ 2 + cotθ ( ) sinθ e iφ is an eigenfunction of the ˆ L 2 operator. / θ 2. φ 2. sin 2 θ φ 2. ( ) = e iφ. = e iφ cosθ.

4.6 Autoregressive Moving Average Model ARMA(1,1)

Section 7.6 Double and Half Angle Formulas

TMA4115 Matematikk 3

Practice Exam 2. Conceptual Questions. 1. State a Basic identity and then verify it. (a) Identity: Solution: One identity is csc(θ) = 1

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

Μια εισαγωγή στα Μαθηματικά για Οικονομολόγους

Section 8.3 Trigonometric Equations

Binet Type Formula For The Sequence of Tetranacci Numbers by Alternate Methods

Nowhere-zero flows Let be a digraph, Abelian group. A Γ-circulation in is a mapping : such that, where, and : tail in X, head in

Partial Differential Equations in Biology The boundary element method. March 26, 2013

Research Article Finite-Step Relaxed Hybrid Steepest-Descent Methods for Variational Inequalities

New bounds for spherical two-distance sets and equiangular lines

The challenges of non-stable predicates

F19MC2 Solutions 9 Complex Analysis

Fourier Series. constant. The ;east value of T>0 is called the period of f(x). f(x) is well defined and single valued periodic function

The Neutrix Product of the Distributions r. x λ

SUPERPOSITION, MEASUREMENT, NORMALIZATION, EXPECTATION VALUES. Reading: QM course packet Ch 5 up to 5.6

Areas and Lengths in Polar Coordinates

Inverse trigonometric functions & General Solution of Trigonometric Equations

Right Rear Door. Let's now finish the door hinge saga with the right rear door

Lecture 13 - Root Space Decomposition II

Uniform Convergence of Fourier Series Michael Taylor

Space-Time Symmetries

ORDINAL ARITHMETIC JULIAN J. SCHLÖDER

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

CHAPTER 48 APPLICATIONS OF MATRICES AND DETERMINANTS

Exercises to Statistics of Material Fatigue No. 5

Second Order RLC Filters

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ.

Homework 3 Solutions

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

Diane Hu LDA for Audio Music April 12, 2010

Transcript:

Supplemetal Material: Scalig Up Sparse Support Vector Machies by Simultaeous Feature ad Sample Reductio Weizhog Zhag * 2 Bi Hog * 3 Wei Liu 2 Jiepig Ye 3 Deg Cai Xiaofei He Jie Wag 3 State Key Lab of CAD&CG, Zhejiag Uiversity, Chia 2 Tecet AI Lab, Shezhe, Chia, 3 Uiversity of Michiga, USA I this supplemet, we first preset the detailed proofs of all the theorems i the mai text ad the report the rest experimet results which are omitted i the experimet sectio due to the space limitatio. A. Proof for Theorem Proof. of Theorem : (i) : Let X = ( x, x 2,..., x ) ad z = X T w, the primal problem (P ) the is equivalet to The Lagragia the becomes α mi w R p,z R 2 w 2 + β w + s.t. z = X T w. L(w, z, θ) = α 2 w 2 + β w + l([z] i ), i= l([z] i ) + X T w z, θ (7) i= = α 2 w 2 + β w Xθ, w + }{{} :=f (w) We first cosider the subproblem mi w L(w, z, θ): By substitutig (9) ito f (w), we get l([z] i ) z, θ } i= {{ } :=f 2(z) w L(w, z, θ) = w f (w) = αw Xθ + β w +, θ (8) Xθ αw + β w w = α S β( Xθ) (9) f (w) = α 2 w 2 + β w αw + β w, w = α 2 w 2 = 2α S β( Xθ) 2. (2) The, we cosider the problem mi z L(w, z, θ): = [z]i L(w, z, θ) = [z]i f 2 (z) = [θ] i =, if [z] i <, γ [z] i, if [z] i γ,, if [z] i > γ. [θ] i, if [z] i <, γ [z] i [θ] i, if [z] i γ, [θ] i, if [z] i > γ. (2) Thus, we have

Scalig Up Sparse Support Vector Machies by Simultaeous Feature ad Sample Reductio { γ f 2 (z) = 2 θ 2, if [θ] i [, ], i [],, otherwise. (22) Combiig Eq. (7), Eq. (2) ad Eq. (22), we obtai the dual problem: (ii) : From Eq. (9) ad Eq. (2), we get the KKT coditios: mi θ [,] 2α S β( Xθ) 2 + γ 2 θ 2, θ (23) w (α, β) = α S β( X T θ (α, β)), if x i, w (α, β) <, [θ (α, β)] i = γ ( x i, w (α, β) ), if x i, w (α, β) γ,, if x i, w (α, β) > γ. i =,...,. B. Proof for Lemma Proof. of Theorem : ) It is the coclusio of the aalysis above. 2) After feature screeig, the primal problem (P ) is scaled ito: α mi w R ˆF c 2 w 2 + β w + Thus, we ca easily derive out the dual problem of (scaled-p -): ad also the KKT coditios: l( [ x i ] ˆF c, w ), (scaled-p -) i= mi D( θ; α, β) = θ [,α] 2α S β( ˆF [ X] θ) 2 + γ c 2 θ 2, θ. (scaled-d -) w (α, β) = α S β( ˆF [ X] θ (α, β)) c, if [ x i ] ˆF c, w (α, β) <, [ θ (α, β)] i = γ ( [ x i] ˆF c, w (α, β)), if [ x i ] ˆF c, w (α, β) γ,, if [ x i ] ˆF c, w (α, β) > γ, (scaled-kkt-) (scaled-kkt-2) The, it is obvious that w (α, β) = [w (α, β)] ˆF c, sice essetially, problem (scaled-p -) ca be derived by substitutig to the weights for the elimiated features i problem (P ) ad optimize over the rest weights. Sice the solutios w (α, β) ad θ (α, β) satisfy the coditios KKT- ad KKT-2 ad [ x i ] ˆF c, w (α, β) = x i, w (α, β) for all i, we kow w (α, β) ad θ (α, β) satisfy the coditios scaled-kkt- ad scaled-kkt-2. So they are the solutios of problems (scaled-p -) ad (scaled-d -). Thus, due to the uiqueess of the solutio of problem (scaled-d -), we have θ (α, β) = θ (α, β) (24) From ) we have, [ θ (α, β)] ˆRc = ad [ θ (α, β)] ˆLc =. Therefore, from the dual problem (scaled-d ), we ca see that [ θ (C, α)] ˆDc ca be recovered from the followig problem: mi ˆθ [,] ˆD c 2α S β( ˆθ + 2 Ĝ Ĝ2) + γ 2 ˆθ 2, ˆθ, Sice [ θ (α, β)] ˆDc = [θ (α, β)] ˆDc, the proof is therefore completed.

Scalig Up Sparse Support Vector Machies by Simultaeous Feature ad Sample Reductio C. Proof for Lemma 2 Proof. Due to the α-strog covexity of the objective P (w; α, β), we have P (w (α, β ); α, β ) P (w (α, β ); α, β ) + α 2 w (α, β ) w (α, β ) 2 which are equivalet to P (w (α, β ); α, β ) P (w (α, β ); α, β ) + α 2 w (α, β ) w (α, β ) 2 α 2 w (α, β ) 2 + β w (α, β ) + l( x i, w (α, β ) ) i= α 2 w (α, β ) 2 + β w (α, β ) + l( x i, w (α, β ) ) + α 2 w (α, β ) w (α, β ) 2 α 2 w (α, β ) 2 + β w (α, β ) + i= α 2 w (α, β ) 2 + β w (α, β ) + + α 2 w (α, β ) w (α, β ) 2 Addig the above two iequalities together, we get α α 2 w (α, β ) 2 α α 2 l( x i, w (α, β ) ) i= l( x i, w (α, β ) ) i= w (α, β ) 2 + α + α w (α, β ) w (α, β ) 2 2 w (α, β ) α + α 2α w (α, β ) 2 (α α ) 2 4α 2 w (α, β ) 2 (25) Substitute the prior that [w (α, β )] ˆF = ito (25), we get [w (α, β )] ˆF c α + α 2α [w (α, β )] ˆF c 2 (α α ) 2 4α 2 w (α, β ) 2 (α + α) 2 4α 2 [w (α, β )] ˆF 2. D. Proof for Lemma 3 Proof. Firstly, we eed to exted the defiitio of D(θ; α, β) to R : { D(θ; α, β), if θ [, ] D(θ;, α, β) = +, otherwise (26) Due to the strog covexity of objective D(θ; α, β), we have D(θ (α, β ), α, β ) D(θ (α, β ), α, β ) + γ 2 θ (α, β ) θ (α, β ) 2, D(θ (α, β ), α, β ) D(θ (α, β ), α, β ) + γ 2 θ (α, β ) θ (α, β ) 2. Sice θ (α, β ), θ (α, β ) [, ], the above iequalities are equivalet to

Scalig Up Sparse Support Vector Machies by Simultaeous Feature ad Sample Reductio 2α S β ( X T θ (α, β )) 2 + γ 2 θ (α, β ) 2, θ (α, β ) 2α S β ( X T θ (α, β )) 2 + γ 2 θ (α, β ) 2, θ (α, β ) + γ 2 θ (α, β ) θ (α, β ) 2, S β ( 2α X T θ (α, β )) 2 + γ 2 θ (α, β ) 2, θ (α, β ) 2α S β ( X T θ (α, β )) 2 + γ 2 θ (α, β ) 2, θ (α, β ) + γ 2 θ (α, β ) θ (α, β ) 2. Addig the above two iequalities, we get That is equivalet to γ(α α ) 2 γ(α α ) 2 θ (α, β ) 2 α α, θ (α, β ) θ (α, β ) 2 α α, θ (α, β ) + γ(α + α) θ (α, β ) θ (α, β ) 2 2 θ (α, β ) 2 α α γα + α + α α θ (α, β ), θ (α, β ) α α θ (α, β ) 2 α α γα, θ (α, β ) (27) That is θ (α, β ) ( α α 2γα + α + α 2α θ (α, β )) 2 ( α α 2α )2 θ (α, β ) γ 2 (28) Substitute the priors that [θ (α, β )] ˆR = ad [θ (α, β )] ˆL = ito (28), we have [θ (α, β )] ˆDc ( α α 2γα + α + α 2α [θ (α, β )] ˆDc) 2 ( α α 2α )2 θ (α, β ) γ 2 (2γ )α + α α + α 2γα 2α [θ (α, β )] ˆL 2 α α 2γα + α + α 2α [θ (α, β )] ˆR 2. E. Proof for Lemma 4 Before the proof of Lemma 4, we should prove that the optimizatio problem i () is equivalet to { } s i (α, β ) = max θ Θ [ xi ] ˆDc, θ + [ x i ] ˆL,, i ˆF c. (29) To avoid otatioal cofusio, we deote the feasible regio Θ i () as Θ. The, { } { max θ Θ [ Xθ] i = max x i θ } θ Θ { } = max [ x i ] ˆDc[θ] θ Θ ˆDc + [ x i ] ˆL[θ] ˆL + [ x i ] ˆR[θ] ˆR { } = max θ Θ [ xi ] ˆDc, [θ] ˆDc + [ x i ] ˆL, = s i (α, β ).

Scalig Up Sparse Support Vector Machies by Simultaeous Feature ad Sample Reductio The last equatio holds sice [θ] ˆL =, [θ] ˆR = ad [θ ˆDc] Θ. Proof. of Lemma 4: s i (α, β ) = max { θ B(c,r) [ xi ] ˆDc, θ + [ x i ] ˆL, }. = max { η B(,r) [ xi ] ˆDc, c + [ x i ] ˆL, + [ x i ] ˆDc, η } = ( [ x i ] ˆDc, c + [ x i ] ˆL, + [ x i ] ) ˆDc r The last equality holds sice [ x i ] ˆDc r [ x i ] ˆDc, η [ x i ] ˆDc r. F. Proof for Theorem 4 Proof. () It ca be obtaied from the the rule (R). (2) It is from the defiitio of ˆF. G. Proof for Lemma 5 Firstly, we eed to poit out that the optimizatio problems i (2) ad (3) are equivalet to the problems: They follow from the fact that [w] ˆF c W ad Proof. of Lemma 5: u i (α, β ) = max w W { [ x i] ˆF c, w }, i ˆD c, (3) l i (α, β ) = mi w W { [ x i] ˆF c, w }, i ˆD c (3) { w, x i } ={ [w] ˆF c, [ x i ] ˆF c [w] ˆF, [ x i ] ˆF } ={ [w] ˆF c, [ x i ] ˆF c } (sice [w] ˆF = ). u i (α, β ) = max { [ x i] ˆF c, w } w B(c,r) = max η B(,r) { [ x i] ˆF c, c [ x i ] ˆF c, η } = [ x i ] ˆF c, c + max η B(,r) { [ x i] ˆF c, η } = [ x i ] ˆF c, c + [ x i ] ˆF c r H. Proof for Theorem 5 Proof. () It ca be obtaied from the the rule (R2). l i (α, β ) = mi { [ x i] ˆF c, w } w B(c,r) = mi η B(,r) { [ x i] ˆF c, c [ x i ] ˆF c, η } = [ x i ] ˆF c, c + mi η B(,r) { [ x i] ˆF c, η } = [ x i ] ˆF c, c [ x i ] ˆF c r

Scalig Up Sparse Support Vector Machies by Simultaeous Feature ad Sample Reductio (2) It is from the defiitios of ˆR ad ˆL. I. Proof for Theorem 2 Proof. of Theorem 2: We prove this theorem by verifyig that the solutios w (α, β) = ad θ (α, β) = satisfy the coditios KKT- ad KKT-2. Firstly, sice β β max = X, we have S β ( X) =. Thus w (α, β) = ad θ (α, β) = satisfy the coditio KKT-. The, for all i [], we have x i, w (α, β) = > γ. Thus w (α, β) = ad θ (α, β) = satisfy the coditio KKT-2. Hece, they are the solutios for the primal problem (P ) ad the dual problem (D ), respectively. J. Proof for Theorem 3 Proof. of Theorem 3: Similar with the proof of Theorem 2, we prove this theorem by verifyig that the solutios w (α, β) = α S β( Xθ (α, β)) ad θ (α, β) = satisfy the coditios KKT- ad KKT-2.. Case : α max (β). The for all α >, we have mi i [] { x i, w (α, β) } = mi i [] { α x i, S β ( Xθ (α, β)) } = mi i [] { α x i, S β ( X) } = α max i [] x i, S β ( X) = ( γ) α α max(β) > γ The, L = [] ad w (α, β) = α S β( Xθ (α, β)) ad θ (α, β) = satisfy the coditios KKT- ad KKT-2. Hece, they are the optimal solutio for the primal ad dual problems (P ) ad (D ). 2. Case 2: α max (β) >. The for ay α α max (β), we have mi i [] { x i, w (α, β) } = mi i [] { α x i, S β ( Xθ (α, β)) } = mi i [] { α x i, S β ( X) } = α max i [] x i, S β ( X) = ( γ) α α max(β) ( γ) = γ. Thus, E L = [] ad w (α, β) = α S β( Xθ (α, β)) ad θ (α, β) = satisfy the coditios KKT- ad KKT-2. Hece, they are the optimal solutio for the primal ad dual problems (P ) ad (D ). K. Proof for Theorem 6 Proof. of Theorem 6: () Give the referece solutios pair w (α i,j, β j ) ad θ (α i,j, β j ), if we do ISS first i SIFS ad apply ISS ad IFS for ifiite times. If after p times of triggerig, o ew iactive features or samples are idetified, the we ca deote the sequece of ˆF, ˆR ad ˆL as: ˆF A = ˆR A = ˆL A = ISS ˆF A, ˆR A, ˆL A IF S ˆF A 2, ˆR A 2, ˆL A 2 ISS... ˆF A p, ˆR A p, ˆL A p IF S/ISS... (32)

Scalig Up Sparse Support Vector Machies by Simultaeous Feature ad Sample Reductio with ˆF A p = ˆF A p+ = ˆF A p+2 =..., ˆR A p = ˆR A p+ = ˆR A p+2 =..., ad ˆL A p = ˆL A p+ = ˆL A p+2 =... (33) I the same way, if we do IFS first i SIFS ad o ew iactive feature or samples are idetified after q times of triggerig of ISS ad IFS, the the sequece ca be deoted as: ˆF B = ˆR B = ˆL B = IF S ˆF B, ˆR B, ˆL B ISS ˆF B 2, ˆR B 2, ˆL B 2 IF S... ˆF B q, ˆR B q, ˆL B q IF S/ISS... (34) with ˆF B q = ˆF B q+ = ˆF B q+2 =..., ˆR B q = ˆR B q+ = ˆR B q+2 =..., ad ˆL B q = ˆL B q+ = ˆL B q+2 =... (35) We first prove that ˆF k B ˆF k+ A, ˆR B k ˆR A k+ ad ˆL B k ˆL A k+ hold for all k by iductio. ) Whe k =, the equalities ˆF B ˆF A, ˆR B ˆR A ad ˆL B ˆL A hold sice ˆF B = ˆR B = ˆL B =. B 2) If ˆF k ˆF k+ A, ˆR B k ˆR A k+ ad ˆL B k ˆL A k+ hold, by the syergy effect of ISS ad IFS, we have ˆF k+ B ˆF k+2 A, ˆR B k+ ˆR A k+2 ad ˆL B k+ ˆL A k+2 hold. B Thus, ˆF k ˆF k+ A, ˆR B k ˆR A k+ ad ˆL B k ˆL A k+ hold for all k. Similar with the aalysis i (), we ca also prove that ˆF k A ˆF k+ B, ˆR A k ˆR B k+ ad ˆL A k ˆL B k+ hold for all k. Combie () ad (2), we ca get ˆF B ˆF A ˆF B 2 ˆF A 3... (36) ˆF A ˆF B ˆF A 2 ˆF B 3... (37) ˆR B ˆR A ˆR B 2 ˆR A 3... (38) ˆR A ˆR B ˆR A 2 ˆR A 3... (39) ˆL B ˆL A ˆL B 2 ˆL A 3... (4) ˆL A ˆL B ˆL A 2 ˆL B 3... (4) by the first equality of (33), (36) ad (37), we ca get ˆF A p = ˆF B q. Similarly, we ca get ˆR A p = ˆR B q ad ˆL A p = ˆL B q. (2) If p is odd, the by (36), (38 ad (4), we have ˆF A p ˆF B p+, ˆR A p ˆR B p+ ad ˆL A p ˆL B p+. Thus q p +. Else if p is eve, the by (37), (39) ad (4), we have ˆF A p ˆF B p+, ˆR A p ˆR B p+ ad ˆL A p ˆL B p+. Thus q p +. Do the same aalysis for q, we ca get p q +. Hece, p q. L. Experimet Result L.. Verificatio of the Syergy Effect Here, we verify the syergy effect betwee ISS ad IFS i SIFS from the experimet results o the dataset real-sim. I Fig. 4, SIFS performs ISS (sample screeig) first, while i Fig. 5, it performs IFS (feature screeig) first. All the rejectio ratios (Fig. 4(a)-(d)) of the st triggerig of IFS whe SIFS performs ISS first are much higher tha (at least equal to) those (Fig. 5(a)-(d)) whe SIFS performs IFS first. I tur, all the rejectio ratios (Fig. 5(e)-(h)) of the st triggerig of ISS whe SIFS performs IFS first are also much higher tha those (Fig. 4(e)-(h)) whe SIFS performs ISS first. This demostrates that the screeig result of ISS ca reiforce the capability of IFS ad vice versa, which is the so called syergy effect. At last, i Fig. 5 ad Fig. 4, we ca see that the overall rejectio ratios at the ed of SIFS are the same, so o matter which (ISS or IFS) we perform first i SIFS, SIFS has the same screeig performaces i the ed. This is cosistet with Theorem 6. L.2. The Rest Experimet Result Below, we report the rejectio ratios of SIFS o sy (Fig. 6), sy3 (Fig. 7), rcv-trai (Fig. 8), rcv-test(fig. 9), url (Fig. ) ad kddb (Fig. ), which are omitted i the mai text due to the space limitatio.

Scalig Up Sparse Support Vector Machies by Simultaeous Feature ad Sample Reductio.995.99.985 Trigger..3..4 Trigger..3..4 Trigger..3..4 Trigger..3..4 (a) β/β max=.5 (b) β/β max=. (c) β/β max= (d) β/β max=.9 Trigger..3..4 Trigger..3..4 Trigger..3..4.995.99 Trigger.985..3..4 (e) β/β max=.5 (f) β/β max=. (g) β/β max= (h) β/β max=.9 Figure 4. Rejectio ratios of SIFS o real-sim whe it performs ISS first (first row: Feature Screeig, secod row: Sample Screeig)..99.98.97 Trigger..3..4 (a) β/β max=.5 Trigger..3..4 (b) β/β max=. Trigger..3..4 (c) β/β max= Trigger..3..4 (d) β/β max=.9 Trigger..3..4 Trigger..3..4.995.99 Trigger.985..3..4 Trigger..3..4 (e) β/β max=.5 (f) β/β max=. (g) β/β max= (h) β/β max=.9 Figure 5. Rejectio ratios of SIFS o real-sim whe it performs IFS first(first row: Feature Screeig, secod row: Sample Screeig). Trigger..3..4 (a) β/β max=.5 Trigger..3..4 (b) β/β max=. Trigger..3..4 (c) β/β max= Trigger..3..4 (d) β/β max=.9.98.96 Trigger..3..4.98.96 Trigger..3..4.98.96 Trigger..3..4.995.99.985 Trigger..3..4 (e) β/β max=.5 (f) β/β max=. (g) β/β max= (h) β/β max=.9 Figure 6. Rejectio ratios of SIFS o sy (first row: Feature Screeig, secod row: Sample Screeig).

Scalig Up Sparse Support Vector Machies by Simultaeous Feature ad Sample Reductio Trigger..3..4 Trigger..3..4 Trigger..3..4 Trigger..3..4 (a) β/β max=.5 (b) β/β max=. (c) β/β max= (d) β/β max=.9 Trigger..3..4 Trigger..3..4 Trigger..3..4 Trigger..3..4 (e) β/β max=.5 (f) β/β max=. (g) β/β max= (h) β/β max=.9 Figure 7. Rejectio ratios of SIFS o sy3 (first row: Feature Screeig, secod row: Sample Screeig)..995.99.985 Roud..3..4 (a) β/β max=.5 Roud..3..4 (b) β/β max=. Roud..3..4 (c) β/β max= Roud..3..4 (d) β/β max=.9 Roud..3..4 Roud..3..4 Roud..3..4.995.99 Roud..3..4 (e) β/β max=.5 (f) β/β max=. (g) β/β max= (h) β/β max=.9 Figure 8. Rejectio ratios of SIFS o rcv-trai dataset (first row: Feature Screeig, secod row: Sample Screeig)..995.99 Roud..3..4 Roud..3..4 Roud..3..4 Roud..3..4 (a) β/β max=.5 (b) β/β max=. (c) β/β max= (d) β/β max=.9 Roud..3..4 Roud..3..4 Roud..3..4.995.99 Roud..3..4 (e) β/β max=.5 (f) β/β max=. (g) β/β max= (h) β/β max=.9 Figure 9. Rejectio ratios of SIFS o rcv-test dataset (first row: Feature Screeig, secod row: Sample Screeig).

Scalig Up Sparse Support Vector Machies by Simultaeous Feature ad Sample Reductio Roud..3..4 Roud..3..4 Roud..3..4 Roud..3..4 (a) β/β max=.5 (b) β/β max=. (c) β/β max= (d) β/β max=.9 Roud..3..4 Roud..3..4 Roud..3..4 Roud..3..4 (e) β/β max=.5 (f) β/β max=. (g) β/β max= (h) β/β max=.9 Figure. Rejectio ratios of SIFS o url dataset (first row: Feature Screeig, secod row: Sample Screeig). Roud..3..4 Roud..3..4 Roud..3..4 Roud..3..4 (a) β/β max=.5 (b) β/β max=. (c) β/β max= (d) β/β max=.9 Roud..3..4 Roud..3..4 Roud..3..4 Roud..3..4 (e) β/β max=.5 (f) β/β max=. (g) β/β max= (h) β/β max=.9 Figure. Rejectio ratios of SIFS o kddb dataset (first row: Feature Screeig, secod row: Sample Screeig).