Supplementary Materials: Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent

Σχετικά έγγραφα
Homework for 1/27 Due 2/5

Lecture 17: Minimum Variance Unbiased (MVUB) Estimators

Last Lecture. Biostatistics Statistical Inference Lecture 19 Likelihood Ratio Test. Example of Hypothesis Testing.

1. For each of the following power series, find the interval of convergence and the radius of convergence:

Example Sheet 3 Solutions

Supplemental Material: Scaling Up Sparse Support Vector Machines by Simultaneous Feature and Sample Reduction

A Two-Sided Laplace Inversion Algorithm with Computable Error Bounds and Its Applications in Financial Engineering

C.S. 430 Assignment 6, Sample Solutions

Στα επόμενα θεωρούμε ότι όλα συμβαίνουν σε ένα χώρο πιθανότητας ( Ω,,P) Modes of convergence: Οι τρόποι σύγκλισης μιας ακολουθίας τ.μ.

Problem Set 3: Solutions

Every set of first-order formulas is equivalent to an independent set

Μια εισαγωγή στα Μαθηματικά για Οικονομολόγους

Other Test Constructions: Likelihood Ratio & Bayes Tests

n r f ( n-r ) () x g () r () x (1.1) = Σ g() x = Σ n f < -n+ r> g () r -n + r dx r dx n + ( -n,m) dx -n n+1 1 -n -1 + ( -n,n+1)

Nowhere-zero flows Let be a digraph, Abelian group. A Γ-circulation in is a mapping : such that, where, and : tail in X, head in

Introduction of Numerical Analysis #03 TAGAMI, Daisuke (IMI, Kyushu University)

ST5224: Advanced Statistical Theory II

2 Composition. Invertible Mappings

Statistical Inference I Locally most powerful tests

α β

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

Πανεπιστήµιο Κρήτης - Τµήµα Επιστήµης Υπολογιστών. ΗΥ-570: Στατιστική Επεξεργασία Σήµατος. ιδάσκων : Α. Μουχτάρης. εύτερη Σειρά Ασκήσεων.

Homework 3 Solutions

The Heisenberg Uncertainty Principle

Bessel function for complex variable

IIT JEE (2013) (Trigonomtery 1) Solutions

Uniform Convergence of Fourier Series Michael Taylor

EE512: Error Control Coding

On Certain Subclass of λ-bazilevič Functions of Type α + iµ

Figure A.2: MPC and MPCP Age Profiles (estimating ρ, ρ = 2, φ = 0.03)..

Fractional Colorings and Zykov Products of graphs

Finite Field Problems: Solutions

LAD Estimation for Time Series Models With Finite and Infinite Variance

Supplement to A theoretical framework for Bayesian nonparametric regression: random series and rates of contraction

Reminders: linear functions


6.1. Dirac Equation. Hamiltonian. Dirac Eq.

A Note on Intuitionistic Fuzzy. Equivalence Relation

= λ 1 1 e. = λ 1 =12. has the properties e 1. e 3,V(Y

Second Order RLC Filters

MATH 38061/MATH48061/MATH68061: MULTIVARIATE STATISTICS Solutions to Problems on Matrix Algebra

F19MC2 Solutions 9 Complex Analysis

2. Let H 1 and H 2 be Hilbert spaces and let T : H 1 H 2 be a bounded linear operator. Prove that [T (H 1 )] = N (T ). (6p)

Areas and Lengths in Polar Coordinates

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

Matrices and Determinants

Presentation of complex number in Cartesian and polar coordinate system

Solve the difference equation

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

Math 446 Homework 3 Solutions. (1). (i): Reverse triangle inequality for metrics: Let (X, d) be a metric space and let x, y, z X.

Section 8.3 Trigonometric Equations

Solutions: Homework 3

ORDINAL ARITHMETIC JULIAN J. SCHLÖDER

Outline. Detection Theory. Background. Background (Cont.)

FREE VIBRATION OF A SINGLE-DEGREE-OF-FREEDOM SYSTEM Revision B

SUPPLEMENT TO ROBUSTNESS, INFINITESIMAL NEIGHBORHOODS, AND MOMENT RESTRICTIONS (Econometrica, Vol. 81, No. 3, May 2013, )

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

Areas and Lengths in Polar Coordinates

Inverse trigonometric functions & General Solution of Trigonometric Equations

The Probabilistic Method - Probabilistic Techniques. Lecture 7: The Janson Inequality

The Pohozaev identity for the fractional Laplacian

Partial Differential Equations in Biology The boundary element method. March 26, 2013

Lecture 13 - Root Space Decomposition II

Math221: HW# 1 solutions

Lecture 2. Soundness and completeness of propositional logic

Ψηφιακή Επεξεργασία Εικόνας

Numerical Analysis FMN011

DESIGN OF MACHINERY SOLUTION MANUAL h in h 4 0.

4.6 Autoregressive Moving Average Model ARMA(1,1)

Section 7.6 Double and Half Angle Formulas

The Simply Typed Lambda Calculus

A study on generalized absolute summability factors for a triangular matrix

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Exercises to Statistics of Material Fatigue No. 5

Exercises 10. Find a fundamental matrix of the given system of equations. Also find the fundamental matrix Φ(t) satisfying Φ(0) = I. 1.

On Inclusion Relation of Absolute Summability

Uniform Estimates for Distributions of the Sum of i.i.d. Random Variables with Fat Tail in the Threshold Case

Solution Series 9. i=1 x i and i=1 x i.

INTEGRATION OF THE NORMAL DISTRIBUTION CURVE

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

w o = R 1 p. (1) R = p =. = 1

On Generating Relations of Some Triple. Hypergeometric Functions

ω ω ω ω ω ω+2 ω ω+2 + ω ω ω ω+2 + ω ω+1 ω ω+2 2 ω ω ω ω ω ω ω ω+1 ω ω2 ω ω2 + ω ω ω2 + ω ω ω ω2 + ω ω+1 ω ω2 + ω ω+1 + ω ω ω ω2 + ω

The ε-pseudospectrum of a Matrix

Concrete Mathematics Exercises from 30 September 2016

Practice Exam 2. Conceptual Questions. 1. State a Basic identity and then verify it. (a) Identity: Solution: One identity is csc(θ) = 1

Μηχανική Μάθηση Hypothesis Testing

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

CHAPTER 103 EVEN AND ODD FUNCTIONS AND HALF-RANGE FOURIER SERIES

Παραμετρικές εξισώσεις καμπύλων. ΗΥ111 Απειροστικός Λογισμός ΙΙ

Generating Set of the Complete Semigroups of Binary Relations

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

b. Use the parametrization from (a) to compute the area of S a as S a ds. Be sure to substitute for ds!

k A = [k, k]( )[a 1, a 2 ] = [ka 1,ka 2 ] 4For the division of two intervals of confidence in R +

Homework 3 Solutions

Affine Weyl Groups. Gabriele Nebe. Summerschool GRK 1632, September Lehrstuhl D für Mathematik

MINIMAL CLOSED SETS AND MAXIMAL CLOSED SETS

5. Choice under Uncertainty

J. of Math. (PRC) Shannon-McMillan, , McMillan [2] Breiman [3] , Algoet Cover [10] AEP. P (X n m = x n m) = p m,n (x n m) > 0, x i X, 0 m i n. (1.

PARTIAL NOTES for 6.1 Trigonometric Identities

Transcript:

Supplemetary Materials: Tradig Computatio for Commuicatio: istributed Stochastic ual Coordiate Ascet Tiabao Yag NEC Labs America, Cupertio, CA 954 tyag@ec-labs.com Proof of Theorem ad Theorem For the proof of Theorem, we first prove the followig Lemma. Lemma. Assume that φ i z is γ-strogly covex fuctio where γ ca be zero. The for ay t > ad s, ], we have E α t α t ] smk smk EP wt α t G t ] λ where G t = ad u t i = φ i x i wt. i= = i Im γ sλ u t i α t i, smk Proof of Lemma. For simplicity, we let Im deote the radom idex {i,, i m } radomly sampled at iteratio t i machie. We begi by boudig the improvemet i the dual objective. By the defiitio of α, we have α t α t ] = φ,iα,i t λg v t φ,i α t,i λg v t By the defiitio of the update, we have g v t = g v t + λ g v t + λ = g v t + λ g v t + λ } {{ } A = i Im α t,ix,i = i Im } {{ } B α,ix t,i g v t + α i I λ ix t i m = i I m α,ix t.iw t + α t λ ix i = = i Im α,ix t.iw t + = i Im mk λ = i Im αi t = i Im

where the first iequality uses the strog covexity of g, the secod equality uses the fact w t = g v t, ad the last iequality uses the Cauchy-Schwarz iequality. The we ca boud A by A = i Im φ,iα t,i + α t,i λg v t = i Im α t,ix,iw t mk λ = i Im We ca further boud A as follows if we maximize R.H.S of above iequality over α,i or if we restrict α,i = s,i u t,i α t,i, where ut,i = φ,i x,i wt. α t,i A = i Im = i Im = i Im = i Im φ,i s,i u t,i α t,i s,i u t,i α t,i λg v t α t i x,iw t mk λ = i Im s,iu t,i α t,i φ,iα t,i s,iφ,iu t,i + γ s,i s,i u t,i α t,i ] λg v t s,i u t,i α t,i x,iw t mk λ = i Im s,iu t,i α t,i where i the first iequality, we use the strog covexity of φ,i. Sice s,i are maximized over the R.H.S of above iequalities, the we have for ay s, ] A = = i Im = i Im + s = s + s = i Im φ,iα t,i sφ,iu t,i + γ s sut,i α t,i ] λg v t su t,i α t,i x,iw t mk λ = i Im s u t,i sφ,iu t,i + x,iw t u t,i }{{} + K = i I sφ,i x,i wt m γ s smk K λ = i Im = i Im u t,i α t,i + s α t,i φ,iα t,i λg v t } {{ } B = i Im φ,i x,iw t + φ,iα t,i + αt,i x,iw t] + B γ s smk λ K = i Im u t,i α t,i φ,iα t,i + αt,i x,iw t] Thus, we have A B s = i Im s λ mk φ,i x,iw t + φ,iα t,i + αt,i x,iw t] γ sλ K s = i Im u t,i α t,i

Optio I: Let α,i = max α φ,iα t,i Optio II: Let z t,i = φ,i x,iw αt Let s,i = max s,] φ,i Let α,i = s,i z t,i α t,i,i Optio III: Let z t,i = φ,i x,iw αt,i Let α,i = s,i z t,i Routie Icualw, scl + α αx,iw scl λ α x,i sz t,i where s,i, ] maximize sφ,iα t,i + φ,ix,iw t + z t,i x,iw + Optio IV oly for smooth loss fuctios: Same as Optios II but replace s,i as s,i = sz t,i x,iw t scl λ s z t,i λγ λγ + scl γs s z t,i scl λ s z t,i x,i Figure : The Basic Variat of the isca Algorithm more optios Taig expectatio over i I m, we have ] A B K m E s = i= s mk λ = mk smk λ = i= φ,i x,iw t + φ,iα t,i + αt,i x,iw t] γ sλ s K m = i= u t,i α t,i φ,i x,iw t + φ,iα t,i + αt,i x,iw t] γ sλ u t,i smk = i= }{{} G t α t,i Note that with w t = g v t, we have gw t + g v t = w t v t ad P w t α t = φ,i x,iw t + λgw t = = = i= = i= = i= = i= φ,i x,iw t + φ,iα t,i + λw t v t φ,i x,iw t + φ,iα t,i + αt,i x,iw t Therefore, we have ] A B Eα t α t ] = E smk P wt α t s mk λ φ,iα t,i λg v t G t ] 3

. Proof of Theorem We first prove the case of smooth loss fuctio. We ca apply Lemma with s = case, G t =. The we have Eα t α t ] smk E P w t α t ] Sice ɛ t we obtai Eɛ t ] smk = exp λγ λγ+mk. I this = α α t P w t α t, ad α t α t = ɛ t ɛt, Eɛ t ] λγmkt λγ + mk smk t Eɛ ] smk t exp smkt where we use that fact ɛ = α, due to that, α P w P. It is the easy to chec that i order to have Eɛ T ] ɛ, it suffices to have T mk + log/ɛ λγ Furthermore, EP w t α t ] T t=t EP w t α t ] smk Eɛt ɛt+ ] smk E ɛ T ɛt smk Eɛt ], ad therefore ] smk EɛT ] We ca complete the proof of Theorem for the smooth loss fuctio by pluggig the values of T ad T.. Proof of Theorem Next, we prove the case of L-Lipschitz cotiuous loss fuctio. We let γ = i Lemma, ad obtai where G t is equal to Eα t α t ] smk G t = EP wt α t ] smk λ Gt i= u t i α t i Followig Lemma 4 i, we ca boud G t by G t 4L. Let G = max t G t. We have Eα t α t ] smk smk EP wt α t G ] λ Similarly, the iequality above idicates the followig iequality, Eɛ t ] smk Eɛ t smk G ] + λ We prove similarly as the followig iequality holds. ɛ t G λ/mk + t t for all t t = max, /mk log λɛ. /GmK I order to have ɛ t ɛ, it suffice to have T G λɛ + t mk 4

Furthermore, we have EP w T ᾱ T ] smkt T Eα α T ] + where w T = T t=t w t /T T ad ᾱ T = T s = /mkt T ad obtai G λt T t=t α t /T T. If T T + EP w T ᾱ T G ] Eα α T ] + λt T I order to have P w T ba T ɛ P, it suffice to have T 4G λɛ P mk + t, ad T T + G λɛ P erivatios of Subproblems i isca ad AMM G λt + + GsmK λ mk, we ca set G λ/mk t + T + G λt T We cosider the l regularizatio where gv = v, w = v. The at t-th each iteratio, we aim to maximize the dual objective o the sampled data poits, i.e., max φ,iα,i t λ α t wt = i Im = φ,iα,i t λ wt + λ = = = i Im = i Im K = i I m K = i I m φ,iα t,i λ φ,iα t,i λ φ,iα t,i λ = i Im K wt + λ i Im wt + λ = ŵt i I m + λ i I m α t,ix,i α t,ix,i α t,ix,i α t,ix,i where ŵ t = w t λ i I α t m,i x,i. Therefore i each machie the goal of each update is to maximize the followig max α i= φ i αi t λ ŵt + λ αix t i where we suppress the subscript. The above problem has the followig primal problem: mi φ i x i w + λ w w w t α t i x i λ i= To see this, we have mi φ i x i w + λ w w w t α t i x i λ i= i= = mi max x i wα i φ i α i + λ w α i w w t λ i= i= = max mi x i wα i φ i α i + λ α w w w t λ i= i= i= i= α t i x i α t i x i 5

primal obj primal obj dd, LSVM, K=, λ= 6. 4 6 8 dd, LSVM, K=, λ= 6. 4 6 8 umber of iteratios isca m= AMMdca ρ= 6 AMMdca ρ= 4 AMMdca ρ= AMMdca ρ= Figure : isca vs AMM. We ca first compute the miimizatio to obtai w = w t we plug this ito above problem, yieldig max φ i α i λ ŵt + α λ i= λ i αt i x i + λ α i x i ad the α i x i I AMM, the primal objective is decomposed across the examples by imposig equality costraits max w,...,w,w s.t. i= φ,i w x,i + λ w = i= w =... = w To solve the problem, a Lagragia fuctio is costructed Lw,..., w, w, u,..., u = φ,i w x,i + λ K w + ρ u w w + ρ = i= The {w }, w, u are optimized alteratively by w t = arg mi w w t = arg mi w u t = u t i= λ w + ρ + w t w t 3 More Experimetal Results = φ,i w x,i + ρ w w t u t K u w w + ρ = K w w We show more experimets i this sectio. Figure compares the practical variat of isca vs AMM with differet pealty parameters. Figure 3 show more experimets by varyig m ad K uder differet settigs. Figure 5 compares the differet variats of isca. = K w w = 6

covtype, LSVM, K=, λ= 4 4 6 8.9.7 covtype, LSVM, K=, λ= 6 4 6 8 umber of iteratios * a varig m m= m= m= m= 3 m= 4 covtype, LSVM, K=, λ= 4 m= m= m= m= 3 m= 4 4 6 8 covtype, LSVM, K=, λ= 6 4 6 8 umber of iteratios * b varyig m m= m= m= 3 m= 4 m= 5. 4 6 8 dd, LSVM, K=, λ= 4 dd, LSVM, K=, λ= 6 4 6 8 umber of iteratios * c varyig m. dd, LSVM, K=, λ= 4 4 6 8 dd, LSVM, K=, λ= 6 4 6 8 umber of iteratios * d varyig m m= m= m= 3 m= 4 m= 5 dd, LSVM, K=, λ= 4 m= m= m= m= 3 m= 4 m= 5. 4 6 8 dd, LSVM, K=, λ= 6 4 6 8 umber of iteratios * e varyig m dd, LSVM, K=, λ= 4 m= m= m= m= 3 m= 4 m= 5 4 6 8 dd, LSVM, K=, λ= 6 4 6 8 umber of iteratios * f varyig m Figure 3: a d: with differet m 7

dualtiy gap covtype, LSVM, m=, λ= 3 K= K= 4 6 8 covtype, LSVM, m=, λ= 6.5 4 6 8 umber of iteratios * a varig K dd, LSVM, m= 3, λ= 4 K= K= 4 6 8 dd, LSVM, m= 3, λ= 6. 4 6 8 umber of iteratios * b varig K covtype, LSVM, m=, λ= 3 4 6 8.9 covtype, LSVM, m=, λ= 6.7 4 6 8 umber of iteratios * c varyig K K= K= covtype, LSVM, m=, λ= 3 K= K= 4 6 8 covtype, LSVM, m=, λ= 3.5 4 6 8 umber of iteratios * d varig K dd, LSVM, m=, λ= 4 K= K=. 4 6 8 dd, LSVM, m=, λ= 6 4 6 8 umber of iteratios * e varyig K dd, LSVM, m=, λ= 4 K= K=. 4 6 8 dd, LSVM, m=, λ= 6. 4 6 8 umber of iteratios * f varyig K Figure 4: with differet K 8

.9 isca LSVM, K=,λ= 4, m= iscaa.9 isca LSVM, K=,λ= 4, m= iscaa.7.7.3 4 6 8 umber of iteratio * a ifferet Variats. 4 6 8 umber of iteratio * b ifferet Variats.9 isca LSVM, K=,λ= 4, m= 3 iscaa.9 isca LSVM, K=,λ= 4, m= 4 iscaa.7.3.7.3.... 4 6 8 umber of iteratio * c ifferet Variats 4 6 8 umber of iteratio * d ifferet Variats.9 isca LSVM, K=,λ= 6, m= iscaa.9 isca LSVM, K=,λ= 6, m= iscaa.7.7 4 6 8 umber of iteratio * e ifferet Variats 4 6 8 umber of iteratio * f ifferet Variats.9.7.3 isca LSVM, K=,λ= 6, m= 3 iscaa.9.7.3.. isca dd, LSVM, K=, λ= 6, m= 4 iscaa. 4 6 8 umber of iteratio * g ifferet Variats 4 6 8 umber of iteratios * h ifferet Variats Figure 5: compariso of differet variats of isca with differet m ad λ o dd data. 9