DEIM Forum 2018 F3-5 657 8501 1-1 657 8501 1-1 E-mail: yuta@cs25.scitec.kobe-u.ac.jp, eguchi@port.kobe-u.ac.jp, ( ) ( )..,,,.,.,.,,..,.,,, 2..., 1.,., (Autoencoder: AE) [1] (Generative Stochastic Networks: GSN) [2].,,.,.,. [3], GSN.,, 2008, 2. GSN,.,,,.,., 3,.,.,,., A B. A.,,.,,.,,,.,. 2. 2. 1 [4]. GSN. GSN. 2. 1. 1 (Autoencoder: AE) [1],, (Latent representation). AE,, 3, X. 1.
1 Structure of Autoencoder. 2 Markov chain in Denoiding Autoencoder.,,.,. X, Y Y = f θ (X) = ϕ(w X + b), X X = f θ (Y ) = ϕ (W Y + b )., W, W, b, b, θ = (W, b), θ = (W, b )., ϕ, ϕ, (Rectified Linear Unit: ReLU).., L = (X, X ).,, L. [5],.,. (Deep Autoencoder) [1].,,.,,.,,.,,. 2. 1. 2 (Denoising Autoencoder: DAE) [6]., X X. Salt and Pepper. X X. DAE, L = (X, X ). X X,.,., DAE. X t+1 P θ1 (X X t), Xt+1 P θ2 ( X X t+1) (1) (1) X X, 2. θ 1 P θ1 X t DAE, θ 2 P θ2 X t+1. (1) X 0, X0, X 1, X1,...,., P (X)., P θ1 (X X), [2].,., X t Walkback. Walkback DAE DAE [2]. 2. 1. 3 DAE (Generative Stochastic Networks: GSN) [2]., DAE,. GSN H t (2), 3. H t+1 P θ1 (H H t, X t), X t+1 P θ2 (X H t+1) (2) 3 Markov chain in Generative Stochastic Networks. 2 GSN 4., W 1, W 2, b 0, b 1, b 2., W 1, W 2 0 1, 1 2, b i i. GSN Walkback, 4 Walkback T 3. (3) (5). Xt 0 = ϕ(w1 T Ht 1 + b 0) (3) Ht 1 = ϕ(w 1Xt 1 0 + W2 T Ht 2 + b 1) (4) Ht 2 = ϕ(w 2Ht 1 1 + b 2) (5) (3) (5),. GSN DAE., [7].
validation set,. 3. GSN 4 Structure of GSN with multiple layers., GSN. (1) X, X. (2) X GSN, X 0 0 = X. (3) 0,. (4). (5) 3 4 1, Walkback T. (6) X X 0 t (t T ) L, 1 T ΣT t=1l(x, X 0 t ).. (7) 3 6. 2. 2,,.,,. 2. 2. 1, 2 X, Y Y = f(x) ( ). Y ( ), X ( ). X = [x 0, x 1,..., x n 1], W = [w 0, w 1,..., w n 1], b., Y = W X T + b = w 0x 0 + w 1x 1 + + w n 1x n1 + b..,. Y i X i N, Σ N 1 i=0 L(Yi, Y i )., L.,, L 2. (6), γ i., F, A a i,j, A F = Σa 2 i,j. COST = γ 0 N i=0 2. 2. 2 L(Y i, Y i ) + γ 1 W 2 F + γ 2 b 2 F (6),.., K, k i(i K) valdation set( ), training set( ), K. [3]. 3. 1, GSN. 5. K 2, Walkback T 5, Salt and Pepper.,. ( ) N, S = {s i,j}, i, j N, Xt k = {Xi 0 t } (i N), = {X 0 it } (i N)., s i,j i X k t j., 1, 2 N, N 1, N 2 (N > N 1 > N 2), N N 1 W 0,1 b 1, W 1,2, b 0, b 2.,, N 1 N W T 0,1. C, (7). C = 1 T T t=1 K 1 (Xt 0 X 0 t ) B 2 F + γ W k,k+1 2 F (7) k=0, B s i,j = 0 b i,j = 1, s i,j > 0 b i,j = β(> 1), s i = Σ js i,j = 0 b i = 0,.,. A, B a i,j, b i,j, (A B) i,j = a i,j b i,j. 3. 2, [3].,.,, (t 1), t.., t,. 5 Structure of the model based on GSN.
4. 4. 1 4. 1. 1,..,.,,, (7),., ( )., A 1 N N, B 1 N N 1, COST 1 = α A 1 2 F + β B 1 2 F., A 1 2N 2N, B 2 2N N 1, COST 2 = α A 2 2 F +β B 2 2 F. 2,, 1 2., Salt and Pepper.. Salt and Pepper, p, A a i,j., a i,j {0, 1}., Salt and Pepper.,, Salt and Pepper.,., n X i = [x i,0, x i,1,..., x i,n 1]. n 1., n n 1 W. Y i = [y i,0, y i,1,..., y i,n1 1] n 1, Y T = W T X T., X i. b = [b 0, b 1,..., b n1 1], Y T i = W T X T i + b T., X i n, Y = [Y 0, Y 1,..., Y n 1], X = [X 0, X 1,..., X n 1], Y = XW + Bias. Bias n Bias = [b, b,..., b]. n.,,.,.,. 4. 1. 2, (7) C (8). C = 1 T K 1 M( (X 0 0 t X t T ) B 2 F ) + γ M( W k,k+1 2 F ) (8) t=1 k=0, M( ) A N 1 N 2, 1 1 N 1 N 2 Σa 2 i,j M( A 2 F ) = N 1 N 2 A 2 F =.,,.. min-max normalization,, 6., 6, 0.,., Salt and Pepper,.,. 6 Histgram of the transaction amount.,., B = [b 0, b 1,..., b n1 1] X i, Bias = [B 0, B 1,..., B n 1]., Y = XW + Bias. X i.,,. 3. 4. 2 2,,.,.,,,.,,,,., 2,..
4. 2. 1,.,, 5. 1., A B, A B., m, i I, j J, X i,j = [IJ] n = 2m X i,j. W n 1. ( ), Bias = b. Y, Y i,j = X i,jw + Bias.. K, K.. training set n, (9). COST = α M( Y Y 2 F )+β M( W 2 F )+γ M( Bias 2 F ) (9), Y, Y, W, Bias, 2 3. 5.,. 2. 5, 3.,. 5. 1,., 2009 7 1 2012 12 31.,, (Quoter), (Agressor),,,., 153, 162,075., Sell Buy 2. A B, A B.,, 153 153., S, i j S i,j.,, min-max normalization, S i,j [0, 1]. 7. d e, N, d = e/(n N). 5%,, 2011 9.,, S 15%., (8) B. s i,j = 0 b i,j = 1, s i,j > 0 b i,j = β(> 1),, s i,j = 0., B. 7 Link density in data set.,.,.,. GSN,.,,,,. GSN,. (lender-focused), (borrower-focused)..,,. R, R i,j., 8, 9. 8 Average of monthly interest rates. 9 Relative frequency distribution of interest rates.
5. 2.,. 5. 2. 1 τ2 X = [x 0, x 1,..., x n 1], Y = [y 0, y 1,..., y n 1]. (x i, y i), (x j, y j)., nc 2 = n(n 1)/2. (x i x j)(y i y j) > 0 P s 1, (x i x j)(y i y j) < 0 P r 1., X U X, Y U Y, X Y U X U Y. (10). τ [ 1, 1], τ. τ = P s P r Ps + P r + U X P s + P r + U Y (10) 5. 2. 2. y i, y i, N. (Mean Squared Error: MSE), (Root Mean Squared Percentage Error: RMSPE), (Mean Absolute Percentage Error: MAPE), (11), (12), (13). MSE(Y, Y ) = 1 N 1 (y i y i) 2 (11) N i=0 RMSP E(Y, Y ) = 100 1 N MAP E(Y, Y ) = 100 N N 1 i=0 N 1 i=0 ( ) yi y i 2 (12) y i yi y i (13) y i 5. 2. 3, 2.,., 15%,., 15%.,, 15%, 15%. 1,. 5. 2. 4, (9) α, β, γ. t, 5. validation set MSE 0.01%.,, MSE. MSE, MSE 5%.,,. 2, MSE,,., validation set( ),.,. t, (t + 1)., 4,., (t + 1) MSE, RMSPE, MAPE. (1) : t 85%. : (t + 1) 85%. (2) : t. : t (t + 1) 85%. (3) : t. : t (t + 1) 85%. (4) : 0 1 (153 153). :. 5. 3,,.,, (W 0,1, W 1,2, b 0, b 1, b 2).,,., (with pretraining) (without pretraining)., 2009 7., K = 2, (100, 50), Walkback T = 5, β = 40, γ = 10 3, 100, 0.3, σ 2 = 0.01., 10, 11.,., 3., [3].,.,.
.,,.,..,, 85% 306 ( 153 + 153 ), 100 ( 50 + 50 ).,. 10 Reconstruction performance in time-series plots. 12 MSE in time-series plots. 11 Link prediction performance in time-series plots. 13 RMSPE in time-series plots. 1 Average of performance. Reconstruction Link prediction lender-focused(pretraining) 0.608 ± 0.026 0.426 ± 0.047 lender-focused 0.580 ± 0.028 0.329 ± 0.045 borrower-focused(pretraining) 0.612 ± 0.032 0.422 ± 0.046 borrower-focused 0.562 ± 0.035 0.340 ± 0.045 previous model(pretraining) 0.433 ± 0.033 0.345 ± 0.043 previous model 0.396 ± 0.065 0.308 ± 0.066, (9) α, β, γ. α {1, 10, 100, 500, 1000, 5000, 10000}, β {1, 10, 100, 1000}, γ {1}., α + β + γ = 1 1., 2009 8, 2009 8 R. validation set MSE 0.01%., 0.01%,., MSE 5%., COST = MSE, α = 500, β = 100, γ = 1.. MSE, RMSPE, MAPE, 12, 13, 14., 2.,. 85%. 14 MAPE in time-series plots. 2 Average of each evaluation. MSE RMSPE MAPE 85% of original 0.1159 77.2 52.97 Deep representation 0.1120 70.2 51.78 Reconstructed 0.1170 72.2 53.02 Random 0.1782 161.0 112.37 5. 4,., 2010 8, 2012 3, 4..,. 15 Frobenius norm of transaction amount for every n-month.
, t (t n) D t D t n, D t D t n F. n = 1, 2, 3, 4, 15., 2010 2 8, 2012 12. 2010 8 (t n) 8..,.,. 16. 16 Change rate of average in transaction amount. 2012 3. 2011 11, 2012 6,.,,.,. 2012 1, 2011 12 2012 1.,., 8 12 1 1.0, 3., 2011 12 2012 1., 4 3. (1) 1 : 2009/07 2010/06 (2) 2 : 2010/07 2011/12 (3) 3 : 2012/01 2012/07 (4) 4 : 2012/08 2012/12 3 Average of performance. MSE RMSPE MAPE First term 0.019 23.5 19.26 Second term 0.138 42.3 32.43 Third term 0.235 119.9 90.58 Fourth term 0.049 203.7 138.66, 1, MAPE, ±20%., 4 MSE,., RMSPE MAPE., (9). 6., GSN., 3,, 2.,,.,.,,,.,,,.,.. (B) 15H02703. [1] Geoffrey Hinton and Ruslan Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, Vol. 313, No. 5786, pp. 504 507, 2006. [2] Yoshua Bengio, Li Yao, Guillaume Alain, and Pascal Vincent. Generalized denoising auto-encoders as generative models. CoRR, Vol. abs/1305.6663,, 2013. [3],,,.. 18 (SIG-FIN), pp. pp.120 127, 2017. [4] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. The MIT Press, 2016. [5] Quoc V. Le, Jiquan Ngiam, Adam Coates, Abhik Lahiri, Bobby Prochnow, and Andrew Y. Ng. On optimization methods for deep learning. In Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML 11, pp. 265 272, USA, 2011. Omnipress. [6] Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, ICML 08, pp. 1096 1103, New York, NY, USA, 2008. ACM. [7] Yoshua Bengio, Nicholas Léonard, and Aaron C. Courville. Estimating or propagating gradients through stochastic neurons for conditional computation. CoRR, Vol. abs/1308.3432,, 2013.