2 2016 3 ( ) Journal of East Cina Normal University (Natural Science) No. 2 Mar. 2016 : 1000-5641(2016)02-0020-10 ramp 1, 2 (1., 163319; 2. 163319) : ramp Amijo-Newton... : ; ; ; : TP181; TP391 : A DOI: 10.3969/j.issn.1000-5641.2016.02.003 Support vector macine in te primal space based on te ramp loss function Abstract: YUAN Yu-ping 1, AN Zeng-long 2 (1. College of Sciences, Heilongjiang Bayi Agricultural University, Daqing Heilongjiang 163319, Cina; 2. College of Economics & Management, Heilongjiang Bayi Agricultural University, Daqing Heilongjiang163319, Cina) Aiming at te problem of standard support vector macine being sensitive to te noise, a new metod of support vector regression (SVR) macine based on dissymmetry quadratic and controlled-insensitive loss function is proposed. Using te concave and convex process optimization and te smoot tecnology algoritm, te problem of non-convex optimization is transformed into te problem of te continuous and twice differentiable convex optimization. Using te Amijo-Newton optimized algoritm of finite : 2015-03 : (20112305110002) (XDB2015-23); (HNK11A-14-07) :,,,. E-mail: byndyyps@sina.com. :,,,. E-mail: anzl@2001@163.com.
2, : ramp 21 iteration termination, te establised optimization model is solved, and te convergence of te algoritm is analyzed. Te algoritm can not only keep te sparse nature of support vector, but also control te abnormal values of te training sample. Te results of te experiment sowed tat te support vector regression macine model proposed kept good generalization ability, and te model could fit better bot te simulated data and te standard data. Compared wit te standard support vector macine (SVM) model, te proposed model not only can reduce te effects of noise and outliers, but also as stronger robustness. Key words: support vector regression; outliers; loss function; concave-convex procedure 0,,,,,.,.,, [1-4]... Lin 2002 (Fuzzy Support Vector Macine, FSVM) [5-6],,,,,,,. [7],, [8-9]. 2006 Xu Crammer Hinge [10], Hinge,, ramp. 2008 Wang, ramp, concave-convex procedure (CCCP), [11]., Zao,, [12]. Xu inge, [13]., [12] ramp, [13].,,,,, ramp,,.
22 ( ) 2016 1 Ramp 1.1 : T = {(x i, y i )} N, x i R n, y i. ε- : min w,b 1 2 w 2 + C (ξ i + ξi ), y i (w x i + b) ε + ξ i, w x i + b y i ε + ξ i, (1) ξ i, ξ i 0, i = 1, 2,, N,, C, (1), ξ i, ξ i, i = 1, 2,, N, : min w,b L ε(w, b) = 1 2 w 2 + C H θ (z i ). (2) ramp H θ (z i ) = min(θ 2, H A (z i )),, 0, ε 1 z i ε 2, H A (z i ) = (z i ε 2 ) 2, z i > ε 2, ( z i ε 1 ) 2, z i < ε 1, z i = w x i + b y i, ε 1 < ε 2. ramp : H θ (z i ) = min{θ 2, max(0, z i ε) 2 }, (0, θ 2 ), θ,., b, H, : min L(f) = 1 N f 2 f 2 H + C H θ (z i ). (3) [14], (3) f : (4) (3), : min β L(β) = 1 2 j=1 f(x) = β i k(x, x i ). (4) ( N β i β j k(x i, x j ) + C H β j k(x i, x j ) y i ). (5) j=1
2, : ramp 23 β = [β 1, β 2,, β N ] T, K, K ij = k(x i, x j ), i, j = 1, 2,, N, (5) : H uber 2 (z) = min L(β) = 1 β 2 βt Kβ + C H(z i ), (6) z i = Ki Tβ y i, i = 1, 2,, N. 1.2 H θ (z),,,, uber : θ[ 2z (2ε 1 + θ)], z ε 1 θ, ( z ε 1 ) 2, ε 1 θ < z < ε 1, H1 uber 0, ε 1 z ε 2, (z) = (z ε 2 ) 2, ε 2 < z < ε 2 + θ, ( z i ε 1 ) 2, z < ε 1, θ[2z (2ε 2 + θ)], z ε 2 + θ, θ[ 2z (2ε 1 + 2θ + )], z ε 1 θ, θ( z ε1 θ)2, ε 1 θ < z ε 1 θ, 0, ε 1 θ z ε 2 + θ, (z ε2 θ)2, ε 2 + θ < z < ε 2 + θ +, ( z i ε 1 ) 2, z < ε 1, θ[2z (2ε 2 + 2θ + )], z ε 2 + θ +, uber, ( < θ 2 ). Huber 1 H2 uber Hθ, uber (z): H uber θ, (z) = H1 uber + H2 uber (z) θ 2 + θ, z ε 1 θ, θ[ 2z (2ε 1 + θ)] θ( z ε1 θ)2, ε 1 θ < z ε 1 θ, ( z ε 1 ) 2, ε 1 θ < z < ε 1, = 0, ε 1 z ε 2, (z ε 2 ) 2, ε 2 < z ε 2 + θ, θ[2z (2ε 2 + θ)] (z ε2 θ)2, ε 2 + θ < z ε 2 + θ +, θ 2 + θ, z > ε 2 + θ +. 0, Hθ, uber (z) H θ (z), : min L(β) = 1 β 2 βt Kβ + C (H uber 1 (z i ) + H uber 2 (z i )), (7)
24 ( ) 2016 z i = K T i β y i, i = 1, 2,, N. (7),. 1. Fig. 1 1 Smoot non-convex loss function 1.3 (Concave-Convex Procedure, CCCP) [15]. E(θ) E cav (θ) E vex (θ), E(θ) = E cav (θ) + E vex (θ), E(θ). : (i) θ; (ii) θ i+1 = arg min θ (E vex (θ) + E cav(θ i ) θ), θ i ; (iii) θ = θ i. CCCP, (7) : min L(β) = 1 β 2 βt Kβ + C (H uber 1 (z i ) + H uber 2 (z i )), L vex = 1 2 βt Kβ + C N Huber 1 (z i ), L cav = C N Huber 2 (z i ), (7) β, (8),. β n+1 = arg min{l vex (β) + L cav (β n ) β}, (8) β β n CCCP, L cav (β n ) L cav (β) β β n. L cav (β n H2 uber (β n ) ) = C z i z i β = C N ηi n Ki T, (9) 2θ, zi n ε 1 θ, ηi n = Huber 2 (β n ) 2θ( zn i ε1 θ), ε 1 θ < zi n < ε 1 θ, = 0, ε z 1 θ zi n ε 2 + θ, i 2θ(zn i ε2 θ), ε 2 + θ < zi n < ε 2 + θ +, 2θ, zi n ε 2 + θ +. (8) (9), min L(β) = 1 β 2 βt Kβ + C ( H uber 1 (z i ) + η n i K T i β ). (10)
2, : ramp 25 CCCP, (7) (10), (10). 2 Amijo-Newton 2.1 Amijo-Newton [16] z i = Ki Tβ y i, ( 1). (1) ε 1 < z i < ε 2, NSV, NSV ; (2) ε 2 < z i < ε 2 + θ + ε 1 θ < z i < ε 1., 4 : ε 2 < z i < ε 2 + θ, ε 1 θ < z i < ε 1, SV 1 SV 2, SV 1 SV 2 ; ε 2 + θ < z i < ε 2 + θ +, ε 1 θ < z i < ε 1 θ, SV 3 SV 4, SV 3 SV 4 ; (3) z i > ε 2 + θ + z i < ε 1 θ ESV 1 ESV 2, ESV 1 ESV 2., SV 1, SV 2, SV 3, SV 4, ESV 1, ESV 2, NSV, I SV1, I SV2, I SV3, I SV4 N N, I SV1 = diag{i SV1, 0 SV2, 0 SV3, 0 SV4, 0 ESV1, 0 ESV2, 0 NSV }, I SV1 SV 1, 0 SV2 SV 2 0., I SV2 = diag{0 SV1, I SV2, 0 SV3, 0 SV4, 0 ESV1, 0 ESV2, 0 NSV }, I SV3 = diag{0 SV1, 0 SV2, I SV3, 0 SV4, 0 ESV1, 0 ESV2, 0 NSV }, I SV4 = diag{0 SV1, 0 SV2, 0 SV3, I SV4, 0 ESV1, 0 ESV2, 0 NSV }, (10) β L(β) Hesse H: L(β) =Kβ + CK( ) I SV1 + I SV2 Kβ + 2CK [ ] (z ε 2 )I SV1 ( z ε 1 )I SV2 2 [( + CK 2θ θ ) ( I SV3 + 2θ θ ] )I SV4, (11) H = 2 L(β) = K + CK( ) I SV1 + I SV2 K. (12) 2 : d = H 1 L(β) ( = β C I N + CK(I ) 1{ SV 1 + I SV2 )K 2[(z ε 2 )I SV1 ( z ε 1 )I SV2] 2 [( + 2θ θ ) ( I SV3 + 2θ θ ]} )I SV4. (13)
26 ( ) 2016 : β n+1 =β n + λd n = β n λ(h n ) 1 n L(β) ( = Cλ I N + C( ) ) I SV1 + I SV2 K 1 { 2 [ ] (z ε 2 )I SV1 ( z ε 1 )I SV2 2 [( + 2θ θ ) ( I SV3 + 2θ θ ]} )I SV4, (14) λ, (10)., N, I N + C(ISV 1 +ISV 2 )Kβ 2,,,, : ( I N + C I SV1 +I SV2 )K 2 0 I SV1 + C 2 KSV 1,SV C 1 2 KSV 1,SV C 2 2 KSV 1,SV C 3 2 KSV 1,SV C 4 2 KSV 1,ESV C 1 2 KSV 1,ESV C 1 2 2 KSV 1,NSV C 2 KSV 2,SV C 1 2 KSV 2,SV C 2 2 KSV 2,SV C 3 2 KSV 2,SV C 4 2 KSV 2,ESV C 1 2 KSV 2,ESV C 2 2 KSV 2,NSV 0 I SV3 = 0 I SV4. B 0 I ESV1 C @ 0 I ESV2 A 0 I NSV. (14) β n+1, β n+1 ESV 1 = 0, β n+1 ESV 2 = 0, β n+1 NSV, : f n+1 (x) = = 0, β n+1 i k(x i, x). (15),,,. Amijo-Newton (10), : Step 1. χ = {(x i, y i ) n }, ρ, ε 1, ε 2, θ,, C, k = 0, χ f 0 (x); Step 2. z 0 = f 0 (x) y : SV 1, SV 2, SV 3, SV 4, ESV 1, ESV 2, NSV ; Step 3. n L(β), n L(β) ρ,, ; Step 4. λ k = max{1, 1/2, 1/4, } Amijo, L(β k ) L(β k + λ k d k ) b 1 λ k d T k L(βk ), b 1 (0, 1 2 ), ; Step 5. (14) β n+1, (15) f n+1 (x), step 2. 2.2 1 Amijo-Newton (10), β n. (7), min β L(β) = 1 2 βt Kβ + C N (Huber 1 (z i ) + H2 uber (z i )), L vex = 1 2 βt Kβ + C N Huber 1 (z i ), L cax = C N Huber 2 (z i ). β n+1 (10) n, : L vex (β n+1 ) + CΣ N ηn i K iβ n+1 L vex (β n ) + CΣ N ηn i K iβ n. (16)
2, : ramp 27, L cav (β n+1 ) L cav (β n ) L cav (β n )(β n+1 β n ) = CΣ N ηn i K iβ n+1 CΣ N ηn i K iβ n, (17) L cav (β n ) = C N ηn i K i. (16) (17), : L vex (β n+1 ) + L cav (β n+1 ) L vex (β n ) + L cav (β n ), (18). L(β) 0,,. 3, UCI.,. (RBF): K(x, y) = exp( x y 2 /σ 2 ). (19) 6 ε 1, ε 2, θ, C, σ,, (C, σ), {2 5, 2 4,, 2 4, 2 5 } {2 5, 2 4,, 2 4, 2 5 }. ramp, ; θ, 0.1 10. θ,,,. θ,,,. (RMSE). Matlab 7.0, Windows XP, 2 GB, 2.99 GHz. 3.1 T = {(x 1, y 1 ), (x 2, y 2 ),, (x 300, y 300 )},, x i, i = 1, 2,, 300, [ 4, 4], y i = sin(3x i )/(3x i ) + γ i, i = 1, 2,, 300, γ i N(0, 0.1 2 ), 200,., (LS-SVR), 2 3. 2 (LS-SVR) Fig. 2 Te algoritm of least square support vector regression (LS-SVR) 3 Fig. 3 Algoritm in tis paper, y i = sin(3x i )/(3x i ).,
28 ( ) 2016,. 1.,,,,. 1 Tab. 1 Te experimental results on artificial data sets RMSE NSV /s 0.056 5 200 0.220 6 0.062 5 102 0.024 8 0.126 9 200 0.343 0 0.043 1 133 0.046 1 3.2 UCI,, UCI (ttp://arcive.ics.uci.edu/ml/) StatLib (ttp://lib.stat.cmu. edu/datasets/). 2. 2 Tab. 2 Statistical information on data set AutoMPG 392 7 300 92 Boston ousing 506 13 300 206 Bodyfat 252 14 200 52, NP-RSVR-NCLF(N-R-N) [10] LS-SVR, 3. 3 NP-RSVR-NCLF LS-SVR UCI Tab. 3 Comparison of te experimental results of standard UCI data set wit NP-RSVR-NCLF algoritm, LS-SVR algoritm and te algoritm in tis paper C σ ε 1 ε 2 RMSE NSV /s LS-SVR 2 8 2 4 / / 3.073 300 0.334 AutoMPG N-R-N 2 8 2 0 2 10 2.115 142 / 2 8 2 1 10 2 5 2.074 12 0.124 Boston- LS-SVR 2 2 2 3 / / 4.262 300 0.414 N-R-N LS-SVR 2 7 2 1 2.1 20 4.034 196 / 2 6 2 1 10 3 0.02 3.657 79 0.156 Bodyfat LS-SVR 2 3 2 5 / / 0.009 200 0.250 N-R-N 2 4 2 4 10 3 0.009 5.73E-04 37 / 2 4 2 3 10 3 0.03 1.52E-04 31 0.154 3 10, LS-SVR, NP-RSVR-NCLF.,, LS-SVR,,,,.
2, : ramp 29 4, CCCP, Armijo-Newton.,,,,.,. [ ] [ 1 ] TSUJINISHI D, ABE S. Fuzzy least squares support vector macines for multiclass problems [J]. Neural Networks, 2003, 16 (5/6): 785-792. [ 2 ] XIU F J, ZHANG Y, JIAN C L. Fuzzy SVM wit a new fuzzy membersip function [J]. Neural Computing and Application, 2006 (15): 268-276. [ 3 ] LIU Y H, CHEN Y T. Face recongnition using total margin-based adaptive fuzzy support vector macines [J]. IEEE Trans on Neural Networks, 2007, 18(1): 178-192. [ 4 ] YU S, YANG X W, HAO Z F, et al An adaptive support vector macine learning algoritm for large classification problem [J]. Lecture Notes in Computer Science, 2006, 3971: 981-990. [ 5 ] LIN C F, WANG S D. Fuzzy support vector macines [J]. IEEE Transactions on Neural Networks, 2002, 3(2): 464-471. [ 6 ] JIN B, ZHANG Y Q Classifying very large data sets wit minimum enclosing ban based support vector macine [C]//Proceedings of te 2006 IEEE International Conference on Fuzzy Systems. Vancouver BC, 2006: 364-368. [ 7 ] BO L, WANG L, JIAO L. Recursive finite Newton algoritm for support vector regression in te primal [J]. Neural Computation, 2007, 19(4): 1082-1096. [ 8 ] CHEN X B, YANG J, LIANG J, et al. Recursive robust least squares support vector regression based on maximum correntropy criterion [J]. Neurocomputing, 2012, 97: 63-73. [ 9 ],,. [J]., 2007, 4l(11): 1315-1320. [10] HUANG H, LIU Y. Fuzzy support vector macines for pattern recognition and data mining [J]. International Journal of Fuzzy Systems, 2002, 4(3): 3-12. [11] WANG L, JIA H, LI J. Training robust support vector macine wit smoot ramp loss in te primal space [J] Neurocomputing, 2008, 71(13/14/15): 3020-3025. [12] ZHAO Y, SUN J. Robust support vector regression in te primal [J]. Neural Networks, 2008, 21(10): 1548-1555. [13] YANG S H, HU B G. A stagewise least square loss function for classfication [C]//Proceedings of te SIAM International Conference on Data Mining. 2008, 120-131. [14] KIMELDORF G S, WAHBA G. A correspondence between Bayesian estimation on stocastic processes and smooting by splines [J]. Annals of Matematical Statistics, 1970, 41(2): 495-502. [15] YUILLE A L, RANGARAJAN A. Te concave-convex procedure (CCCP) [J]. Neural Computation, 2003, 15(4): 915-936. [16] FUNG G, MANGASARIAN O L. Finite Newton metod for Lagrangian support vector macine classification [J]. Neurocomputing, 2003, 55(1/2): 39-55. ( : )