Transfer Learning. keywords: transfer learning, inductive transfer, domain adaptation, multitask learning, semi-supervised learning

1 Transfer Learnng Toshhro Kamshma Natonal Insttute of Advanced Industral Scence and Technology (AIST) mal@kamshma.net, http://www.kamshma.net/ keywords: transfer learnng, nductve transfer, doman adaptaton, multtask learnng, sem-supervsed learnng 1. (transfer learnng) [TT 05] the problem of retanng and applyng the knowledge learned n one or more tasks to effcently develop an effectve hypothess for a new task CPU montor CPU montor CPU Web (sem-supervsed learnng) [Chapelle 06] [ 05, 97] 1995 NIPS [LtL 95] 10 (nductve transfer) (doman adaptaton) (multtask learnng) knowledge transfer, learnng to learn lfetme learnng (covarate shft) [Shmodara 00, 06] [Heckman 79, 09] Pan & Yang [Pan 08b] Daumé [Daumé]

2 25 4 2010 7 2 3 [Bshop 08, 03, 06, 09] 2. (a) (b) 1 2 1 (source doman) (target doman) (S) (T ) D {S,T } X (D) X (D) M Y (D) Y (D) {0,1} { 1,+1} (x (D),y (D) ) (X (D),Y (D) ) x (D) X (D) N (D) N 2 2 (multtask learnng) Daumé / Daumé Pan & Yang A B C B C A A C B A B C A B C A B C B C A A (B C) 1(a) A A B C 1(b) 100 101 (meta learnng) [Vlalta 02] No Free Lunch [Wolpert 97]

3 1 (1) S+T + (2) S+T (3) S T + (4) S T 2 3 4 Daumé 1 (1) S+T + {(x (S),y (S) )} {(x (T ),y (T ) )} Pan & Yang Daumé (1) (nductve transfer learnng) (2) (transductve transfer learnng) (3) (self-taught learnng) (4) (unsupervsed transfer learnng) (1) S+T + Pr (S) [X,Y ] Pr (T ) [X,Y ] X (S) X (T ) Y (S) Y (T ) (2) S+T Daumé Pan & Yang Pr (S) [Y X] = Pr (T ) [Y X] Pr (S) [X] Pr (T ) [X] X (3) S T + Daumé Pr (S) [X] = Pr (T ) [X] Rana [Rana 07] Pr (S) [X] = Pr (T ) [X] Y (S) Y (T ) Rana Pr (S) [X] Pr (T ) [X] Rana Pr (S) [X] Pr (T ) [X] (4) S T Pr (S) [X] = Pr (T ) [X] Daumé X (S) X (T ) [Wang 08] 2 4 [Caruana 97] (negatve transfer) [Rosensten 05] 2

4 25 4 2010 7 2 Pan & Yang Daumé (feature-based) (nstance-based) (separated) (ntegrated) 3(a) 3(b) (a) (b) 3 3. 3 1 [Pan 08b, Daumé, 09] [TT 05, TSL 09, TM 09] 90 [Caruana 96, Munro 97] Thrun [Thrun 96] 1 k explanaton-based Caruana [Caruana 97]

5 (1) (data amplfcaton) (2) (attrbute selecton) (3) (eavesdroppng) (4) (representaton bas) 4 3 2 [Daumé 07] 0 3 (x (T ),y (T ) ) ( x (T ),0,x (T ),y (T ) ) (x (S),y (S) ) ( x (S),x (S),0,y (S) ) 0 3 3 3 1 [Caruana 97] K Pr[x,y Θ ]Pr[Θ Ψ] Θ Ψ [Rana 06] [Daumé 06] f Gbbs exp(λ f) λ Gbbs [Xue 08] plsa [Hofmann 99] must/cannot [Wagstaff 01] [Ando 08] [Tshby 99] [Da 07a] (co-clusterng) [Argyrou 07] K k f k (x) = M m a mk(u x) M u a mk 0 u a mk U A K k=1 =1 N L(y k,a k (U x k )) + γ A 2 2,1 1 2 L 2 L 1 0 [Lng 08]

6 25 4 2010 7 normalzed cut [Sh 00] W W (S) D = dag(w1) D (S) = dag(w (S) 1), x x (D W)x x Dx + β U x + λ x (D (S) W (S) )x x D (S) x 1 Raylegh W 0 1 x 2 3 1 β λ 2 3 [Rana 07] m M b 1,...,b m mn a,b x (S) m a j b j 2 2 + β a 1, s.t. b j 2 1 j a j m j b arg mn x (T ) c j c j b j 2 2 + γ c 1 m c y (T ) [Ando 05] [Argyrou 08] [Satpal 07] [Wang 08] Fsher [Do 06, Pan 08a] [Rückert 08] 3 4 AdaBoost [Freund 96, 99] TrAdaBoost [Da 07b] TrAdaBoost T t h t (x ) {0,1} h t ( ) ϵ t < 1/2 β t = ϵ t /(1 ϵ t ) 1/β t 1/(1 + (2lnt)/T ) 1 0 T t= T/2 β h t(x) t T t= T/2 β 1/2 t RankBoost[Freund 03] [ 09] [Breman 96] TrBagg[Kamshma 09] TrBagg

7 [Eaton 08] [Gao 08] (covarate shft) [Shmodara 00] Pr (S) [X] Pr (T ) [X] Pr (S) [Y X] = Pr (T ) [Y X] θ N (T ) Pr (T ) [x ] Pr (S) [x ] loss(y(s),x (S) ;θ) [ 06] [Sugyama 07b, Huang 07, 07a] (sample selecton bas) [Heckman 79, 09] [Zadrozny 04] x y s {0,1} (x,y) s = 1 s = 0 x s y Pr[y s,x] = Pr[y x] Pr[y x] Pr[y x] Pr[x] Pr[y s,x] = Pr[y x] Pr[y x] s Pr[x] s SVM SVM [Xng 07] brdged refnement 3 5 Mgratory-Logt [Lao 05] µ w max w,µ σ(y (T ) w x (T ) ) + lnσ(y (S) w x (S) + y (S) µ ) 1 subject to y (S) N (S) µ C, C 0, y (S) µ 0 y (D) { 1,+1} y w x y (S) µ (S) x (S) y (S) w x (S) y (S) µ (S) N (S) C C [Wu 04] 3 3 [Wu 04] 3 6 (currculum learnng) [Bengo 09]

8 25 4 2010 7 3 3 [Rosensten 05] (negatve transfer) [Sh 08] [Bltzer 08, Crammer 08] [Ben-Davd 07, Da 07b] 4. [TSL 09, TM 09] [Ando 05] Ando, R. K. and Zhang, T.: A Framework for Learnng Predctve Structures from Multple Tasks and Unlabeled Data, Journal of Machne Learnng Research, Vol. 6, pp. 1817 1853 (2005) [Ando 08] Ando, S. and Suzuk, E.: Unsupervsed Cross-doman Learnng by Interacton Informaton Co-clusterng, n Proc. of The 8th IEEE Int l Conf. on Data Mnng, pp. 13 22 (2008) [Argyrou 07] Argyrou, A., Evgenou, T., and Pontl, M.: Mult-Task Feature Learnng, n Advances n Neural Informaton Processng Systems 19, pp. 41 48 (2007) [Argyrou 08] Argyrou, A., Maurer, A., and Pontl, M.: An Algorthm for Transfer Learnng n a Heterogeneous Envronment, n Proc. of The ECML/PKDD2008, Part I, pp. 71 85 (2008), [LNAI 5211] [ 03],,,, 6, (2003) [ 09] Doman Adaptaton, 2009, pp. 69 72 (2009) [Ben-Davd 07] Ben-Davd, S., Bltzer, J., Crammer, K., and Perera, F.: Analyss of Representatons for Doman Adaptaton, n Advances n Neural Informaton Processng Systems 19, pp. 137 144 (2007) [Bengo 09] Bengo, Y., Louradour, J., Collobert, R., and Weston, J.: Currculum Learnng, n Proc. of The 26th Int l Conf. on Machne Learnng, pp. 41 48 (2009) [Bshop 08] Bshop, C. M.:, (2007 2008), [Bltzer 08] Bltzer, J., Crammer, K., Kulesza, A., Perera, F., and Wortman, J.: Learnng Bounds for Doman Adaptaton, n Advances n Neural Informaton Processng Systems 20, pp. 129 136 (2008) [Breman 96] Breman, L.: Baggng Predctors, Machne Learnng, Vol. 24, pp. 123 140 (1996) [Caruana 96] Caruana, R., Baluja, S., and Mtchell, T.: Usng The Future to Sort Out The Present: Rankprop and Multtask Learnng for Medcal Rsk Evaluaton, n Advances n Neural Informaton Processng Systems 8, pp. 959 965 (1996) [Caruana 97] Caruana, R.: Multtask Learnng, Machne Learnng, Vol. 28, pp. 41 75 (1997) [Chapelle 06] Chapelle, O., Schölkopf, B., and Zen, A. eds.: Semsupervsed Learnng, MIT Press (2006) [Crammer 08] Crammer, K., Kearns, M., and Wortman, J.: Learnng from Multple Sources, Journal of Machne Learnng Research, Vol. 9, pp. 1757 1774 (2008) [Da 07a] Da, W., Xue, G.-R., Yang, Q., and Yu, Y.: Co-clusterng based Classfcaton for Out-of-doman Documents, n Proc. of The 13th Int l Conf. on Knowledge Dscovery and Data Mnng, pp. 210 219 (2007) [Da 07b] Da, W., Yang, Q., Xue, G.-R., and Yu, Y.: Boostng for Transfer Learnng, n Proc. of The 24th Int l Conf. on Machne Learnng, pp. 193 200 (2007) [Daumé] Daumé, H., III: natural language processng blog, http://nlpers.blogspot.com/search/label/ doman%20adaptaton [Daumé 06] Daumé, H., III and Marcu, D.: Doman Adaptaton for Statstcal Classfers, Journal of Artfcal Intellgence Research, Vol. 26, pp. 101 126 (2006) [Daumé 07] Daumé, H., III: Frustratngly Easy Doman Adaptaton, n Proc. of the 45th Annual Meetng of the Assocaton of Computatonal Lngustcs, pp. 256 263 (2007) [Do 06] Do, C. B. and Ng, A. Y.: Transfer Learnng for Text Classfcaton, n Advances n Neural Informaton Processng Systems 18, pp. 299 306 (2006) [Eaton 08] Eaton, E., desjardns, M., and Lane, T.: Modelng Transfer Relatonshps Between Learnng Tasks for Improved Inductve Transfer, n Proc. of The ECML/PKDD2008, Part I, pp. 317 332 (2008), [LNAI 5211] [Freund 96] Freund, Y. and Schapre, R. E.: Experments wth a New Boostng Algorthm, n Proc. of The 13th Int l Conf. on Machne Learnng, pp. 148 156 (1996) [ 99] Y., R.,,, Vol. 14, No. 5, pp. 771 780 (1999) [Freund 03] Freund, Y., Iyer, R., Schapre, R. E., and Snger, Y.: An

9 Effcent Boostng Algorthm for Combnng Preferences, Journal of Machne Learnng Research, Vol. 4, pp. 933 969 (2003) [Gao 08] Gao, J., Fan, W., Jang, J., and Han, J.: Knowledge Transfer va Multple Model Local Structure Mappng, n Proc. of The 14th Int l Conf. on Knowledge Dscovery and Data Mnng, pp. 283 291 (2008) [Heckman 79] Heckman, J.: Sample Selecton Bas as a Specfcaton Error, Econometrca, Vol. 47, pp. 153 161 (1979) [Hofmann 99] Hofmann, T.: Probablstc Latent Semantc Analyss, n Uncertanty n Artfcal Intellgence 15, pp. 289 296 (1999) [ 09],, (2009) [Huang 07] Huang, J., Smola, A. J., Gretton, A., Borgwardt, K. M., and Schölkopf, B.: Correctng Sample Selecton Bas by Unlabeled Data, n Advances n Neural Informaton Processng Systems 19, pp. 601 608 (2007) [Kamshma 09] Kamshma, T., Hamasak, M., and Akaho, S.: TrBagg: A Smple Transfer Learnng Method and Its Applcaton to Personalzaton n Collaboratve Taggng, n Proc. of The 9th IEEE Int l Conf. on Data Mnng, pp. 219 228 (2009) [ 09],,, R, 5, (2009) [Lao 05] Lao, X., Xue, Y., and Carn, L.: Logstc Regresson wth an Auxlary Data Source, n Proc. of The 22nd Int l Conf. on Machne Learnng, pp. 505 512 (2005) [Lng 08] Lng, X., Da, W., Xue, G.-R., Yang, Q., and Yu, Y.: Spectral Doman-Transfer Learnng, n Proc. of The 14th Int l Conf. on Knowledge Dscovery and Data Mnng, pp. 488 496 (2008) [LtL 95] Learnng to Learn: Knowledge Consoldaton and Transfer n Inductve Systems, http://socrates.acadau. ca/courses/comp/dslver/nips95 LTL/transfer. workshop.1995.html (1995) [ 06],,,, (2006) [Munro 97] Munro, P. W. and Parmanto, B.: Competton Among Networks Improves Commttee Performance, n Advances n Neural Informaton Processng Systems 9, pp. 592 598 (1997) [ 97],, Vol. 38, No. 7, pp. 557 588 (1997) [Pan 08a] Pan, S. J., Kwok, J. T., and Yang, Q.: Transfer Learnng va Dmensonalty Reducton, n Proc. of the 23rd Natonal Conf. on Artfcal Intellgence, pp. 677 682 (2008) [Pan 08b] Pan, S. J. and Yang, Q.: A Survey on Transfer Learnng, Techncal Report HKUST-CS08-08, Dept. of Computer Scence and Engneerng, Hong Kong Unv. of Scence and Technology (2008) [Rana 06] Rana, R., Ng, A. Y., and Koller, D.: Constructng Informatve Prors usng Transfer Learnng, n Proc. of The 23rd Int l Conf. on Machne Learnng, pp. 713 720 (2006) [Rana 07] Rana, R., Battle, A., Lee, H., Packer, B., and Ng, A. Y.: Self-taught Learnng: Transfer Learnng from Unlabeled Data, n Proc. of The 24th Int l Conf. on Machne Learnng, pp. 759 766 (2007) [Rosensten 05] Rosensten, M. T., Marx, Z., Kaelblng, L. P., and Detterch, T. G.: To Transfer or Not To Transfer, n NIPS-2005 Workshop on Inductve Transfer: 10 Years Later (2005) [Rückert 08] Rückert, U. and Kramer, S.: Kernel-Based Inductve Transfer, n Proc. of The ECML/PKDD2008, Part II, pp. 220 233 (2008), [LNAI 5212] [Satpal 07] Satpal, S. and Sarawag, S.: Doman Adaptaton of Condtonal Probablty Models va Feature Subsettng, n Proc. of the 11th European Conf. on Prncples of Data Mnng and Knowledge Dscovery, pp. 224 235 (2007), [LNAI 4702] [Sh 00] Sh, J. and Malk, J.: Normalzed Cuts and Image Segmentaton, IEEE Trans. on Pattern Analyss and Machne Intellgence, Vol. 22, No. 8, pp. 888 905 (2000) [Sh 08] Sh, X., Fan, W., and Ren, J.: Actvely Transfer Doman Knowledge, n Proc. of The ECML/PKDD2008, Part II, pp. 342 357 (2008), [LNAI 5212] [Shmodara 00] Shmodara, H.: Improvng Predctve Inference under Covarate Shft by Weghtng the Log-Lkelhood Functon, J. of Statstcal Plannng and Inference, Vol. 90, pp. 227 244 (2000) [ 06],, Vol. 13, No. 3, pp. 111 118 (2006) [ 07a],, Vol. 18, No. 10, pp. 1 6 (2007) [Sugyama 07b] Sugyama, M., Krauledat, M., and Müller, K. R.: Covarate Shft Adaptaton by Importance Weghted Cross Valdaton, Journal of Machne Learnng Research, Vol. 8, pp. 985 1005 (2007) [ 09],,,, n WdbDB Forum 2009 (2009) [Thrun 96] Thrun, S.: Is Learnng The n-th Thng Any Easer Than Learnng The Frst?, n Advances n Neural Informaton Processng Systems 8, pp. 640 646 (1996) [Tshby 99] Tshby, N., Perera, F. C., and Balek, W.: The Informaton Bottleneck Method, n Proc. of The 37th Annual Allerton Conference on Communcatons, Control and Computng (1999) [TM 09] ICDM 2009 Workshop: Int l Workshop on Transfer Mnng, http://www.cse.ust.hk/ snnopan/cfp/ cdm09wtm.html (2009) [TSL 09] NIPS 2009 Workshop: Transfer Learnng for Structured Data, http://www.cse.ust.hk/ snnopan/ nps09tlsd/ (2009) [TT 05] NIPS 2005 Workshop Inductve Transfer: 10 Years Later, http://trl.acadau.ca/tws05/ (2005) [Vlalta 02] Vlalta, R. and Drss, Y.: A Perspectve Vew and Survey of Meta-Learnng, Artfcal Intellgence Revew, Vol. 18, pp. 77 95 (2002) [Wagstaff 01] Wagstaff, K., Carde, C., Rogers, S., and Schroedl, S.: Constraned K-means Clusterng wth Background Knowledge, n Proc. of The 18th Int l Conf. on Machne Learnng, pp. 577 584 (2001) [Wang 08] Wang, Z., Song, Y., and Zhang, C.: Transferred Dmensonalty Reducton, n Proc. of The ECML/PKDD2008, Part II, pp. 550 565 (2008), [LNAI 5212] [ 05],,,,,,, (2005) [Wolpert 97] Wolpert, D. H. and Macready, W. G.: No Free Lunch Theorems for Optmzaton, IEEE Transactons on Evolutonary Computaton, Vol. 1, pp. 67 82 (1997) [Wu 04] Wu, P. and Detterch, T. G.: Improvng SVM Accuracy by Tranng on Auxlary Data Sources, n Proc. of The 21st Int l Conf. on Machne Learnng, pp. 871 878 (2004) [Xng 07] Xng, D., Da, W., Xue, G.-R., and Yu, Y.: Brdged Refnement for Transfer Learnng, n Proc. of the 11th European Conf. on Prncples of Data Mnng and Knowledge Dscovery, pp. 324 335 (2007), [LNAI 4702] [Xue 08] Xue, G.-R., Da, W., Yang, Q., and Yu, Y.: Topc-brdged PLSA for Cross-Doman Text Classfcaton, n Proc. of The 31th Annual ACM SIGIR Conf. on Research and Development n Informaton Retreval, pp. 627 634 (2008) [Zadrozny 04] Zadrozny, B.: Learnng and Evaluatng Classfers under Sample Selecton Bas, n Proc. of The 21st Int l Conf. on Machne Learnng, pp. 903 910 (2004) 1968 1992 1994 2001 ( ) AAAI, ACM