Sparse Modeling and Model Selection

15 Sparse Modeling and Model Selection L L L β β δ>0 limp(β β>δ)=0 n (β β)n (0, ) n

p =(,, ) n {(, )i=1,, n} =(,, ) X = (,, ) =(,, ) X X n X =0, j=1,, p. =0, 1 n =1, X =Xβ+ε. β=(β, β ) ε ε N (0, σ I ) σ β =arg min ( Xβ +λβ ) β λ 0 m a=(a,,a ) a = a λ=0 β λ β 0 λ=0 β =(X X ) X E[β ]=β (X X ) λ>0 β =(X X +λi ) X λ (X X +λi ) λ (X X λ+i ) β λ (X X )

β =arg min ( Xβ +λβ ). β λβ =λ β λβ = λ β β λβ β =0 X X X =ni X X ni n>p X X X =ni X X =ni {n(β 2β β )+λβ }+const. β =(β,,β ) β β β j=1,, p β =sign(β ) β λ A A>0 A = 0 2n. β =X n X n λ(2n) λ(2n) t t t β >s j s λ s t β=0 p k β k+1 RSS = Xβ X X =ni RSS= X A n A k+1 X A X A RSS X λ t λ(2n) t n p

X X ni β X ( Xβ )= λ 2 s s s β {sign(β s )} [ 1, 1] β 0, β =0. λ A={jβ 0} β A βa C β A=(X A X A) X A λ 2 sa, β A C =0. β A, s A β, s A λ β =0 β =0 λ λ λ =max* n * X j n j =arg max* n λ λ A={j } λ λ=λ s =1 j j j j A={j, j } λ λ A={j, j } λ λ* λ=λ* λ λ* s 1 1 λ=λ* s { 1, 1} β =0 s=±1 β =0 λ λ* λ=λ* s=±1 X λ=2n β β =0 s=±1 AA {j} λ λ* s { 1, 1} λ=λ* s 1 1 A AA{j} λ* β A A (XA X A) XA (XA X A) λ sa 2 0 β A (XA X A) p β β=(β 0, β 0 C ) S ={jβ 0} S ={1,, p}{s } S s s β 0 C 0 β 0

β C 0 =0 n (β 0 β 0 ) N 0, σ 0 0 =limx 0 X 0 n β 0 λ n n p p n p p n s n p R= 1 n Xβ Xβ R E[R]= 1 n EX (X X ) X ε = p n σ p E[R] S 0 n<p n>p p n E[R]= s n σ s p E[R] S S L L Xβ +λβ. β = 1{β 0} L λ=2σ C E[R] p λ=2σ (log p+log log p) E[R] 4s σ log p n log log p + n +O(n ) log p n log p E[R] 0 s σ 0 s E[R] L p L X ϕ β C 0 3β 0 β L

β 0 (β X Xβ)s ϕ β 0 s β 0 X X X X β C 0 3β 0 0<δ<1 λ=4σ 2n log(2pδ) 1 δ R Cs σ log p n 1 +O(n ) ϕ L n log p R 0 L X L X L sign(β )=sign(β) X β γc (X 0 C X 0 )(X 0 X 0 ) 1 γ min β >(λ)(2n) 0 C (λ)=λ[(x 0 X 0 n) + 4σ C ] m a a = maxa m m A=(A ) A =max λ>4σγ 2n log p A λ Λ (X 0 X 0 n)> C Λ ( ) C n P[sign(β )=sign(β)] 1 4 exp( c λ n) c p>n sign(β )=sign(β) X β X X

n<p n p s X X X L R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society : Series B Statistical Methodology, vol. 58, no. 1, pp. 267-288, 1996. B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, Least angle regression with discussion, The Annals of Statistics, vol. 32, pp. 407-499, 2004. J. Friedman, T. Hastie, and R. Tibshirani, Sparse inverse covariance estimation with the graphical lasso, Biostatistics, vol. 9, no. 3, pp. 432-441, 2008. S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Foundations and Trends in Machine Learning, vol. 3, no. 1, pp. 1-122, 2011. K. Knight and W. Fu, Asymptotics for lasso-type estimators, The Annals of Statistics, vol. 28, no. 5, pp. 1356-1378, 2000. P.J. Bickel, Y. Ritov, and A.B. Tsybakov, Simultaneous analysis of lasso and dantzig selector, The Annals of Statistics, vol. 37, no. 4, pp. 1705-1732, 2009. M.J. Wainwright, Sharp thresholds for high-dimensional and noisy sparsity recovery using-constrained quadratic programming lasso, IEEE Trans. Inf. Theory, vol. 55, no. 5, pp. 2183-2202, 2009. P. Zhao and B. Yu, On model selection consistency of lasso, J. Mach. Learn. Res., vol. 7, no. 2, p. 2541, 2007. J. Friedman, T. Hastie, and R. Tibshirani, Regularization paths for generalized linear models via coordinate descent, Journal of Statistical Software, vol. 33, no. 1, pp. 1-22, 2010. M.Y. Park and T. Hastie, L1-regularization path algorithm for generalized linear models, Journal of the Royal Statistical Society : Series BStatistical Methodology, vol. 69, no. 4, pp. 659-677, 2007. N. Meinshausen and P. Bühlmann, High-dimensional graphs and variable selection with the lasso, The Annals of Statistics, vol. 34, no. 3, pp. 1436-1462, 2006. M. Yuan and Y. Lin, Model selection and estimation in the gaussian graphical model, Biometrika, vol. 94, no. 1, pp. 19-35, 2007. D.M. Witten, R. Tibshirani, and T. Hastie, A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis, Biostatistics, vol. 10, no. 3, p. kxp008, 2009. H. Zou, T. Hastie, and R. Tibshirani, Sparse principal component analysis, Journal of Computational and Graphical Statistics, vol. 15, no. 2, pp. 265-286, 2006. M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society : Series BStatistical Methodology, vol. 68, no. 1, pp. 49-67, 2006. R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight, Sparsity and smoothness via the fused lasso, Journal of the Royal Statistical Society : Series BStatistical Methodology, vol. 67, no. 1, pp. 91-108, 2005. 2010 A.E. Hoerl and R.W. Kennard, Ridge regression : Biased estimation for nonorthogonal problems, Technometrics, vol. 12, no. 1, pp. 55-67, 1970. T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, 2nd ed., Springer, New York, 2008. R. Tibshirani, Sparsity and the lasso, 2015, http:www.stat.cmu. edu ~ larry=smlsparsity. pdf R.J. Tibshirani, The lasso problem and uniqueness, Electronic Journal of Statistics, vol. 7, pp. 1456-1490, 2013. J. Fan and R. Li, Variable selection via nonconcave penalized likelihood and its oracle properties, J. Am. Stat. Assoc., vol. 96, no. 456, pp. 1348-1360, 2001. H. Zou, The adaptive lasso and its oracle properties, J. Am. Stat. Assoc., vol. 101, no. 476, pp. 1418-1429, 2006. D.P. Foster and E.I. George, The risk inflation criterion for multiple regression, The Annals of Statistics, vol. 22, no. 4, pp. 1947-1975, 1994. S. Xiong, Better subset regression, Biometrika, vol. 101, no. 1, pp.

71-84, 2014. P. Bhlmann and S. van de Geer, Statistics for High-Dimensional Data : Methods, Theory and Applications, 1st ed., Springer Publishing Company, Incorporated, 2011. S. Foucart and H. Rauhut, A mathematical introduction to compressive sensing, Springer, 2013. vol. 1908, pp. 39-48, 2014. L. Chen and J.Z. Huang, Sparse reduced-rank regression for simultaneous dimension reduction and variable selection, J. Am. Stat. Assoc., vol. 107, no. 500, pp. 1533-1545, 2012. M. Vounou, T.E. Nichols, G. Montana, and A.D.N. Initiative, Discovering genetic associations with high-dimensional neuroimaging phenotypes : A sparse reduced-rank regression approach, Neuroimage, vol. 53, no. 3, pp. 1147-1159, 2010. R. Lockhart, J. Taylor, R.J. Tibshirani, and R. Tibshirani, A significance test for the lasso, The Annals of Statistics, vol. 42, no. 2, pp. 413-468, 2014. E.J. Candès, J. Romberg, and T. Tao, Robust uncertainty principles : Exact signal reconstruction from highly incomplete frequency information, IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489-509, 2006. マンションマンションマンションマンションマンションマンションマンションマンションマンションマンションマンションマンションマンションマンションマンションマンションマンションマンションマンションマンションマンションマンションマンションマンションマンションマンションマンション