Gradient Descent for Optimization Problems With Sparse Solutions

Σχετικά έγγραφα
Defects in Hard-Sphere Colloidal Crystals

A Classical Perspective on Non-Diffractive Disorder

f(w) f(z) = C f(z) = z z + h z h = h h h 0,h C f(z + h) f(z)

Π Ο Λ Ι Τ Ι Κ Α Κ Α Ι Σ Τ Ρ Α Τ Ι Ω Τ Ι Κ Α Γ Ε Γ Ο Ν Ο Τ Α


Α Ρ Ι Θ Μ Ο Σ : 6.913

P AND P. P : actual probability. P : risk neutral probability. Realtionship: mutual absolute continuity P P. For example:

Κεφάλαιο 1 Πραγματικοί Αριθμοί 1.1 Σύνολα

2. Α ν ά λ υ σ η Π ε ρ ι ο χ ή ς. 3. Α π α ι τ ή σ ε ι ς Ε ρ γ ο δ ό τ η. 4. Τ υ π ο λ ο γ ί α κ τ ι ρ ί ω ν. 5. Π ρ ό τ α σ η. 6.

Dissertation for the degree philosophiae doctor (PhD) at the University of Bergen

Diamond platforms for nanoscale photonics and metrology


Vol. 37 ( 2017 ) No. 3. J. of Math. (PRC) : A : (2017) k=1. ,, f. f + u = f φ, x 1. x n : ( ).

Robust Network Interdiction with Invisible Interdiction Assets

ACTA MATHEMATICAE APPLICATAE SINICA Nov., ( µ ) ( (

Κατακόρυφη - Οριζόντια μετατόπιση καμπύλης

Το Θεώρημα Stone - Weierstrass

Vol. 38 No Journal of Jiangxi Normal University Natural Science Nov. 2014

Local Approximation with Kernels

Probabilistic Approach to Robust Optimization

u = 0 u = ϕ t + Π) = 0 t + Π = C(t) C(t) C(t) = K K C(t) ϕ = ϕ 1 + C(t) dt Kt 2 ϕ = 0

ΣΥΓΚΛΙΣΗ ΣΥΝΑΡΤΗΣΗΣ: Ορισμός Cauchy

Appendix to On the stability of a compressible axisymmetric rotating flow in a pipe. By Z. Rusak & J. H. Lee

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM


The Nottingham eprints service makes this work by researchers of the University of Nottingham available open access under the following conditions.

3Νο. ασκήσεις Α Ν Α Λ Υ Σ Η 1Ο Κ Ε Φ Α Λ Α Ι Ο. Θετική Τεχνολογική Κατεύθυνση ( ) ( 0)

Development and Verification of Multi-Level Sub- Meshing Techniques of PEEC to Model High- Speed Power and Ground Plane-Pairs of PFBS

HW 3 Solutions 1. a) I use the auto.arima R function to search over models using AIC and decide on an ARMA(3,1)

ΕΥΤΕΡΑ 27 ΜΑΪΟΥ 2013 ΜΑΘΗΜΑΤΙΚΑ ΘΕΤΙΚΗΣ ΚΑΙ ΤΕΧΝΟΛΟΓΙΚΗΣ ΚΑΤΕΥΘΥΝΣΗΣ ΗΜΕΡΗΣΙΩΝ ΛΥΚΕΙΩΝ ΕΝ ΕΙΚΤΙΚΕΣ ΑΠΑΝΤΗΣΕΙΣ

1 Βασικές Έννοιες Θεωρίας Πληροφορίας

F19MC2 Solutions 9 Complex Analysis

ΠΑΡΟΡΑΜΑΤΑ ΕΚΔΟΣΗ 12 ΜΑΡΤΙΟΥ 2018

Συνήθεις Διαφορικές Εξισώσεις Ι ΣΔΕ Bernoulli, Riccati, Ομογενείς. Διαφορικές Εξισώσεις Bernoulli, Riccati και Ομογενείς

Laplace Expansion. Peter McCullagh. WHOA-PSI, St Louis August, Department of Statistics University of Chicago

encouraged to use the Version of Record that, when published, will replace this version. The most /BCJ BIOCHEMICAL JOURNAL

6.642 Continuum Electromechanics

Problem Set 3: Solutions

F h, h h 2. Lim. Lim. f h, h fyx a, b. Lim. h 2 y 2. Lim. Lim. Lim. x 2 k 2. h 0

= df. f (n) (x) = dn f dx n

Supplementary Materials: A Preliminary Link between Hydroxylated Metabolites of Polychlorinated Biphenyls and Free Thyroxin in Humans

T : g r i l l b a r t a s o s Α Γ Ί Α Σ Σ Ο Φ Ί Α Σ 3, Δ Ρ Α Μ Α. Δ ι α ν ο μ έ ς κ α τ ο ί κ ο ν : 1 2 : 0 0 έ ω ς 0 1 : 0 0 π μ

(x y) = (X = x Y = y) = (Y = y) (x y) = f X,Y (x, y) x f X

þÿ»±íº »¹ Áà  : É º±¹ Ä þÿ Á³ Ä Å : ¼¹± ºÁ¹Ä¹º ±À Ä ¼

APPENDIX 1: Gravity Load Calculations. SELF WEIGHT: Slab: 150psf * 8 thick slab / 12 per foot = 100psf ROOF LIVE LOAD:

Note: Please use the actual date you accessed this material in your citation.


κ α ι θ έ λ ω ν α μ ά θ ω...

On the summability of divergent power series solutions for certain first-order linear PDEs Masaki HIBINO (Meijo University)

Ν Κ Π 6Μ Θ 5 ϑ Μ % # =8 Α Α Φ ; ; 7 9 ; ; Ρ5 > ; Σ 1Τ Ιϑ. Υ Ι ς Ω Ι ϑτ 5 ϑ :Β > 0 1Φ ς1 : : Ξ Ρ ; 5 1 ΤΙ ϑ ΒΦΓ 0 1Φ ς1 : ΒΓ Υ Ι : Δ Φ Θ 5 ϑ Μ & Δ 6 6

cos t dt = 0. t cos t 2 dt = 1 8 f(x, y, z) = (2xyz, x 2 z, x 2 y) (2xyz) = (x2 z) (x 2 z) = (x2 y) 1 u du =

This is an electronic reprint of the original article. This reprint may differ from the original in pagination and typographic detail.

MÉTHODES ET EXERCICES

1 + t + s t. 1 + t + s

sup(a + B) = sup A + sup B inf(a + B) = inf A + inf B.

Ψηφιακή ανάπτυξη. Course Unit #1 : Κατανοώντας τις βασικές σύγχρονες ψηφιακές αρχές Thematic Unit #1 : Τεχνολογίες Web και CMS

Teor imov r. ta matem. statist. Vip. 94, 2016, stor

þÿÿ ÁÌ» Â Ä Å ¹µÅ Å½Ä ÃÄ

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

DiracDelta. Notations. Primary definition. Specific values. General characteristics. Traditional name. Traditional notation

ΕΠΑΝΑΛΗΠΤΙΚΑ ΘΕΜΑΤΑ 2015 Β ΦΑΣΗ. Ηµεροµηνία: Κυριακή 10 Μαΐου 2015 ιάρκεια Εξέτασης: 2 ώρες ΑΠΑΝΤΗΣΕΙΣ

6.642, Continuum Electromechanics, Fall 2004 Prof. Markus Zahn Lecture 8: Electrohydrodynamic and Ferrohydrodynamic Instabilities

Statistical Inference I Locally most powerful tests

(i) f(x, y) = xy + iy (iii) f(x, y) = e y e ix. f(z) = U(r, θ) + iv (r, θ) ; z = re iθ

Αρµονική Ανάλυση. Ενότητα: Ολοκλήρωµα Riemann και ολοκλήρωµα Lebesgue - Ασκήσεις. Απόστολος Γιαννόπουλος. Τµήµα Μαθηµατικών

Δ Ι Α Φ Ο Ρ Ι Κ Ο Ι Τ Ε Λ Ε Σ Τ Ε Σ

f f 2 0 B f f 0 1 B 10.3 Ακρότατα υπό συνθήκες Πολλαπλασιαστές του Lagrange

Déformation et quantification par groupoïde des variétés toriques

FORMULAS FOR STATISTICS 1

M. J. Lighthill. g(y) = f(x) e 2πixy dx, (1) d N. g (p) (y) =

Μεταπτυχιακή Μιγαδική Ανάλυση. Έβδομο φυλλάδιο ασκήσεων, Παραδώστε λυμένες τις 4, 9, 15, 19, 24 και 28 μέχρι

Multi-GPU numerical simulation of electromagnetic waves

1 I X (f) := f(x t ) dt. f B

.. ntsets ofa.. d ffeom.. orp ism.. na s.. m ooth.. man iod period I n open square. n t s e t s ofa \quad d ffeom \quad orp ism \quad na s \quad m o


SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

apj1 SSGA* hapla P6 _1G hao1 1Lh_PSu AL..AhAo1 *PJ"AL hp_a*a

Neural'Networks' Robot Image Credit: Viktoriya Sukhanova 123RF.com

Τύπος TAYLOR. f : [a, b] R f (n 1) (x) συνεχής x [a, b] f (n) (x) x (a, b) ξ μεταξύ x και x 0. (x x 0 ) k k! f(x) = f (k) (x 0 ) + R n (x)


Homework for 1/27 Due 2/5

Ανταλλακτικά για Laptop Lenovo

Finite difference method for 2-D heat equation

D Alembert s Solution to the Wave Equation

Πρόταση. f(x) ομοιόμορφα συνεχής στο I. δ (ɛ) > 0 : x, ξ I, x ξ < δ (ɛ, ξ) f(x) f(ξ) < ɛ. ɛ > 0, δ > 0 : ΜΗ ομοιόμορφα συνεχής.

Περιεχόμενα. A(x 1, x 2 )

Written Examination. Antennas and Propagation (AA ) April 26, 2017.

Πανελλαδικές εξετάσεις Μαθηµατικά Προσανατολισµού Γ Λυκείου. Ενδεικτικές Απαντήσεις ϑεµάτων. Θέµα Β. (α) ϑεωρία. (ϐ) i, ii) ϑεωρία.

Nondifferentiable Convex Functions

Καρτεσιανές συντεταγμένες Γραφική παράσταση συνάρτησης

Συνήθεις Διαφορικές Εξισώσεις Ι Ασκήσεις - 26/10/2017. Διαφορικές Εξισώσεις Bernoulli, Riccati και Ομογενείς

Second Order RLC Filters

Homework 8 Model Solution Section

Πανεπιστήμιο Κρήτης. 19 Οκτωβρίου 2015 Μεταπτυχιακή εργασία στα πλαίσια του προγράμματος "Μαθηματικά και εφαρμογές τους"

ΔΙΑΤΜΗΜΑΤΙΚΟ ΠΡΟΓΡΑΜΜΑ ΜΕΤΑΠΤΥΧΙΑΚΩΝ ΣΠΟΥΔΩΝ ΣΤΗ ΔΙΟΙΚΗΣΗ ΕΠΙΧΕΙΡΗΣΕΩΝ ΘΕΜΕΛΙΩΔΗΣ ΚΛΑΔΙΚΗ ΑΝΑΛΥΣΗ ΤΩΝ ΕΙΣΗΓΜΕΝΩΝ ΕΠΙΧΕΙΡΗΣΕΩΝ ΤΗΣ ΕΛΛΗΝΙΚΗΣ ΑΓΟΡΑΣ

F (x) = kx. F (x )dx. F = kx. U(x) = U(0) kx2

+ 1 n 5 (η) {( 1) n + 1 m

Selective mono reduction of bisphosphine


Transcript:

Gradient Descent for Optimization Problems With Sparse Solutions The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Chen, Hsieh-Chung. 2016. Gradient Descent for Optimization Problems With Sparse Solutions. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences. Citable link http://nrs.harvard.edu/urn-3:hul.instrepos:33493549 Terms of Use This article was downloaded from Harvard University s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:hul.instrepos:dash.current.terms-ofuse#laa

l 0

0

l 1

l 0 Sparse solution path Standard solution path Optimal solution l 1

1

min {Q(z) =f(z)+λ g(z)} z Rn f : R n R g : R n R λ 0 f N {x i,y i } N M z y x z ŷ = M z (x i ) y i

min z { f(z) 1 N } N (M z (x i ),y i ) i (ŷ, y) =(ŷ y) 2 (ŷ, y) = log(1 + expŷy ) (ŷ, y) =(1 ŷy) + g(z) z C 0 : z C g(z) = : z/ C l 0 l 1 l 0 g(z) = z 0 l 1 l p 0 p 1

inf 2 1 0.5 0 l p l 1 p l 1 l 1 g(z) = n max( z i,τ) i τ > 0 l 1 l 1 l 0 l 1 g(z) = τ>0 n i log(1 + z i τ )

l 1 l 1 l 0 l 1 l 1 g(z) = θ y : y <θ n ĝ(z i ) ĝ(y) = : θ<= y <θτ i y 2 +2τθ y θ 2 2(τ 1) 1 2 (τ + 1)θ2 : θτ <= y θ>0 τ>2 g(z) = n θ y 1 2τ ĝ(z i ) ĝ(y) = y2 : y <τθ i 1 2 τθ2 : τθ <= y

l 1 g(z) = z 1 + τ z 2 2 l 1 λ l 1 /l p g(z) = g G w g z g p G l p l 2 l w g

1 1 2 3 4 5 2 5 1 2 3 4 5 1 2 3 4 5 3 4 6 7 8 1 2 3 4 5 f Ω L x, y Ω: f(x) f(y) L x y f L f L L f f(z) f(x) f(x),z x L z x 2 2 l p l q 1 p + 1 q =1 l 2

x f(z) f(x)+ f(x),z x + L 2 z x 2. f α f(x) f(y) f(x),x y α x y 2 2 g(x) = f(x) α 2 x 2 α f x f(y) f(x)+ f(x),y x + α 2 z x 2. min z x Az 2 2 + λ z 1 x A z

X D = arg min X DZ 2 2 + g(z) D,Z D D x z z = arg min x Dz 2 2 + g(z) z l 1 x A R m n p n min z y Az 2 2 + λ z 1 min z y Az 2 2 z 0 <k T A

min z y T (Az) 2 2 + λ z 1 min z y T (Az) 2 2 z 0 <k T ( ) :R m R m ( ) ( ) z = arg min z z = arg min z 1 n 1 2n log(1 + exp( y i x T i z)) + g(z) i max(0, 1 b i x T i z) 2 + g(z) i (x i,y i ) (R n, R) z

l p ell p p<1

2

x k x k Q(x) Q y Q y,l (x) Q(y)+ Q(y) T (x y)+ L 2 x y 2 2 Q : R n R L Q L>L Q Q y,l (x) Q(x) Q Q Q y (x) y arg min Qy,L (x) = x (y 1 L Q(y)) 2 2 x R n y 1 L Q(y)

z k+1 = arg min Qz k,l(z) =z k 1 L Q(zk ) z R n Q Q(z k ) Q(z ) cl Q 2k z0 z 2 2 z Q Q f g Q = f + g y ˆQ y,l (x) f(y)+ f(y) T (x y)+ L 2 x y 2 2 + g(x) L L f f g ˆQ(x) arg min x R n ˆQy,L (x) = arg min x R n g(x)+ L 2 x (y 1 L f(y)) 2 2

g f h (v) arg min x R n h(x)+ 1 2 x v 2 2 z k+1 = L (z k ) arg min ˆQz k,l(z) z R n = arg min z R n g(z)+ L 2 z (zk 1 L f(zk )) 2 2 = 1 L g(zk 1 L f(zk )) g g ( L ) x y := 1 L g(x 1 L f(x)) y

l 1 g(z) =λ z 1 [ λ (v)] i = (v i )( v i λ) + g C l 0 C = {x : x 0 <K} v i : v i [ k (v)] i = 0 : l 0 K l 1 l 1 l 0

1 L L L f L L f L f L η L ˆQ L η =2 1 L ( ) x, L, η y := L (x) L := ηl Q(y) ˆQ x,l (y) y, 1 η L y k z k z k+1 = (y k ) t k+1 =(1+ 1+(2t k ) 2 )/2 y k+1 = z k+1 + tk 1 1 t k (z k+1 z k )

Q Q(z k ) Q(z ) cl f (k + 1) 2 z0 z 2 2 Q(z k ) Q(z k 1 ) s k+1 = (y k ) s k : Q(s k+1 ) <Q(z k ) z k+1 = z k : z k+1 = z k

v k+1 = (y k ) m k+1 = (x k ) v k+1 : f(v k+1 ) f(m k+1 ) z k+1 = m k+1 : t k+1 =(1+ 1+(2t k ) 2 )/2 y k+1 = z k+1 + tk 1 1 t k (z k+1 z k ) f f

ˆQ y (x) f(y)+ f(y) T (x y)+d(x, y)+g(x) D(x, y) = L 2 x y 2 2 D(x, y) = 1 2 x y H H f w A w A = w T Aw H H z k+1 = arg min ˆQ z k(z) = H g (z k H 1 f(z k )) H H h (v) arg min x R n h(x)+ 1 2 x v H

D(x, y) = 1 2 x y H D(x, y) g D(x, y) ψ D(x, y) :=B ψ (x, y) ψ(x) ψ(y) x y, ψ (y) ψ(x) = x 2 2 D(x, y) = x y 2 2 ψ(p) = i {p i log p i p i } g g(z) = n i g(z i) z 1

z 0 R n i {1, 2,...,n} z i := 1 L g(z i 1 L f(z) i) z g l 1 [z k+1 ] ik := arg min z ik f(z)+g(z) z j =[z k ] j j i k i k i k i k i k [ f(z k )] ik g(z) = Ω i Ω g(z Ω i ) Ω i Ω

f g g f min z { f(z) 1 N } N (M z (x i ),y i ) i N (x i,y i ) f f(z) = 1 N N (M z (x i ),y i ) i f f

0 l 1

3 λ 0 >λ 0 λ k λ K = λ {λ 0,...,λ K = λ}

0 (LASSO) min z { 1 2 y Xz 2 2 + λ z 1 } y R N X R N n λ z(λ) S : λ z(λ) S z(λ) λ

z(λ k ) z(λ k+1 ) λ 0 = X T y z(λ 0 )=0 S z(λ k ) [z(λ )] Ω =(X T ΩX Ω ) 1 (X T Ωy λ (z(λ k )) Ω ) [z(λ )] Ω c =0 Ω z(λ k ) z(λ ) λ i Ω c X T i (y Xz(λk )) 1 = λ i Ω i Ω z(λ ) i = 0 i Ω λ k k λ k K K

z 0 =0 Ω={} k =0 i =argmax i X T i (y Xz k ) Ω:=Ω i θ := (z k ) [θ] i := (X T i (y Xz k )) ẑ := 0 [ẑ] Ω := (X T Ω X Ω) 1 (X T Ω y λ θ Ω) z k ẑ z k+1 Ω:={j :[z k+1 ] j 0} k := k +1 z k λ λ λ λ k

λ z =0 R n Ω={} k =0, 2, 3,...,log η (λ/λ 0 ) z k =argmin z x Az + λ k z 1 λ k+1 = η λ k z λ ηλ λ k K K α A T z α

l 1 z 0 α l 0 { min 1 z 2 x } Az 2 2 z 0 <λ λ λ l 0 A λ

z =0 R n Ω={} k =1, 2, 3,...,λ i =argmax i A T i (x Az) Ω:=Ω i z Ω := (A T Ω A Ω) 1 (A T Ω x) z λ k λ λ k 0 λ

4

z 0 =0 [ K (z k+1 z k )] i =0 k K ( ) i (z k ) i ρ τ k 1

τ 1 =min j Ωz k ( (zk ) j ) τ 2 = 1 2 ( δ Ω z k + δ ) δ = (z k ) z k τ k 1 τ 1 τ 2 τ k =min({ρ τ k 1,τ 1,τ 2 }) 0 τ min x {Q(x)} z i τ z i > 0 τ =0 τ z i [ (z k ) z k ] i τ 2 zi k i τ 2 zi k =0

τ τ i τ 0 τ Q τ τ τ k = ρ τ k 1 ρ [0, 1] τ τ τ τ 1 = min (z k ) i i Ω(z k ) τ 2 = 1 2 ( δ Ω(z k ) + δ ) δ = (z k ) z k τ k =min({ρ τ k 1,τ 1,τ 2 }) τ 1 τ 2 τ ρ τ 0 ξ max i [ (z 0 ) z 0 ] i ξ [0, 1] ξ = 1

y i y i τ [ τ (y)] i 0 y i <τ () () x, L, τ, η τ x + := L (x) y := τ (x + ) L := ηl min(q(y),q(x + )) ˆQ x,l (x + ) y, 1 η L z 0 z 0 l 0 g

z 0 =0 k =1, 2, 3,... z k z k τ z k k log ρ 2ϵ L nτ 0 Q((z k )) Q( (z k )) ϵ τ k 0 Q((z k )) Q( (z k )) L 2 (zk ) (z k ) L n i 2 (τ k ) 2 L 2 nρ k τ 0 L f L 2 nρ k τ 0 ϵ k log ρ 2ϵ L nτ 0 z k Ω((z))

g ˆΩ z = { i i Ω(z) a i > 1 2 ( a Ω(z) + a ) } a =((z) f(z)) + 0 z 3 z 0 z1 z 2 f ( f,1) z 0 z 1 z 2 z 3 τ

g i i ˆΩ z [ z (g)] i = 0 i/ ˆΩ z ˆΩ z y yˆωx := 1 L g(xˆωx 1 L f(x)ˆωx ) x, L, τ, η ˆΩ x y := 1 L g(x 1 L τ( f(x))) L := ηl Q(y) Q(x) y, 1 η L z 0 =0 z k = (z k 1 ) Q(z k ) Q(z ) 2nLR2 k +4 R z 0 z

Q(z k ) Q(z k+1 ) 1 2L [ f(zk )] Ω(z k ) 2 1 2nL f(zk ) 2 1 2nLR 2 (Q(z k ) Q(z )) 2 z i z =0 R n τ i ˆΩ z z i := 1 L g(z i 1 L f(z) i) z

z k LARS FSS FISTA ASH:FISTA z k 10 A R 150 200 z 0 = 10

z A R 700 1000 1 l 0

τ l 0

5

l 2 A R 2000 20000 z 0 = 200 ρ 2000 20000 1% 200

ρ z 0 = 50 z 0 = 50 z 0 = 10 z 0 = 10 A l 0 l 0 l 0 l 0 2000 500 500 2000 ρ z 0 =0

FSS MPL PGH FISTA ASH:FISTA CD ASH:CD FSS MPL PGH FISTA ASH:FISTA CD ASH:CD λ λ 2 k λ λ λ λ 2 A R 500 2000 z 0 = 50 Q(z ) 10 4 10 8

x t t x t+1 = D t(x t ) D t {x 0,x 1,...,x N } x i R p

D = arg min D R p n,z 0,z 1,...,z N N { xi Dz i 2 2 + g(z i ) } i N g D D R p n p <n {z i } D D {z i } g l 1 l 0 g

l 1 D D (x) arg min x Dz 2 2 + g(z) z g l 1 g

D R 500 2000 g(z) = g G z g p {x 0,x 1,...,x p } [x] i = max{[x 0 ] i, [x 1 ] i,...,[x p ] i }

AM-FED GENKI CK+ original data after pre-processing l 1 l 0 l 0 l 0

Sparse coding (LASSO/OMP) split max pooling flip Sparse coding (LASSO/OMP) MNNSC-LASSO NNSC-LASSO SC-LASSO MNNSC-OMP NNSC-OMP SC-OMP Gabor

A x R n x = Dz + ϵ D R n t z R t z 0 k k t ϵ y R m x Φ y =Φx Φ R m n x D ẑ = arg min y ΦDz 2 2 + λg(z) z

g l 1 l 0 x ˆx = Dẑ (ΦD) 2k z z 0 k Φ D (ΦD) D m k n t y x Φ l 1 l 1 l 0 l 0

l 0 y

24.5 9.25% 28.1 16.3% 31 24.5% 3 T (x) = 1 α (αx) y = T (Φx)

T ( ) : R m R m 3 ISTA FISTA map ASH:FISTA ASH:mAP ISTA FISTA map ASH:FISTA ASH:mAP ẑ = arg min y T (ΦDz) 2 2 + λg(z) z l 1 l 1

z = arg min z 1 N N log(1 + exp( y i x T i z)) + g(z) i (x i,y i ) (R n, R) l 1

l 1 10 20 l 1 11.59 12 18 24 l 0

f(w) = i (y i, (w, x i )) w f w f 90% l 0 10%

l 0 10% l 0

ASH:Adam ASH:Adam T 50

dense net sparse net sparse net (ASH) l 0

6 f g f

g ˆΩ z A R p n x Az k 2 2 zk f(z k ) = 2(A T x (A T A)z k ) A T x A T Az k A T A z k ˆΩ z

(A T A) Ω z Ω n 2 nk A T (A Ω z Ω ) 2pn pn + pk k n k z k A T A A T A [z k+1 ] s = g ([z k ] s +2A T s (x Az k )) s [z k+1 ] s ([z k+1 ] s [z k ] s )

l 0 l 0 l 0 80%

7