Σχετικά έγγραφα
Stabilization of stock price prediction by cross entropy optimization

{takasu, Conditional Random Field

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

Analyze/Forecasting/Create Models


1530 ( ) 2014,54(12),, E (, 1, X ) [4],,, α, T α, β,, T β, c, P(T β 1 T α,α, β,c) 1 1,,X X F, X E F X E X F X F E X E 1 [1-2] , 2 : X X 1 X 2 ;

A Bonus-Malus System as a Markov Set-Chain. Małgorzata Niemiec Warsaw School of Economics Institute of Econometrics

Probabilistic Approach to Robust Optimization

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

[1] DNA ATM [2] c 2013 Information Processing Society of Japan. Gait motion descriptors. Osaka University 2. Drexel University a)

1 (forward modeling) 2 (data-driven modeling) e- Quest EnergyPlus DeST 1.1. {X t } ARMA. S.Sp. Pappas [4]


ΑΛΓΟΡΙΘΜΟΙ ΕΞΟΡΥΞΗΣ ΠΛΗΡΟΦΟΡΙΑΣ (Data Mining) Πανδή Αθηνά

ΤΜΗΜΑΕΠΙΧΕΙΡΗΜΑΤΙΚΟΥΣΧΕΔΙΑΣΜΟΥ & ΠΛΗΡΟΦΟΡΙΑΚΩΝΣΥΣΤΗΜΑΤΩΝ. ΤΕΧΝΙΚΕΣ ΠΡΟΒΛΕΨΕΩΝ& ΕΛΕΓΧΟΥ ΜΑΘΗΜΑ ΘΕΩΡΙΑΣ-ΣΤΑΣΙΜΕΣΔΙΑΔΙΚΑΣΙΕΣ-ΥΠΟΔΕΙΓΜΑΤΑ ARIMA (p,d,q)

HMY 795: Αναγνώριση Προτύπων

Buried Markov Model Pairwise

Anomaly Detection with Neighborhood Preservation Principle

Applying Markov Decision Processes to Role-playing Game

ΑΝΔΡΟΥΛΑΚΗΣ ΜΑΝΟΣ A.M AΛΓΟΡΙΘΜΟΙ ΕΞΟΡΥΞΗΣ ΠΛΗΡΟΦΟΡΙΑΣ

Data Analytics Και Ευφυή Συστήματα Πρόβλεψης Δεδομένων Σε Χρονοσειρά. Εφαρμογή Στον Εναρμονισμένο Δείκτη Τιμών Καταναλωτή.

Matrices and vectors. Matrix and vector. a 11 a 12 a 1n a 21 a 22 a 2n A = b 1 b 2. b m. R m n, b = = ( a ij. a m1 a m2 a mn. def

EM Baum-Welch. Step by Step the Baum-Welch Algorithm and its Application 2. HMM Baum-Welch. Baum-Welch. Baum-Welch Baum-Welch.

HW 3 Solutions 1. a) I use the auto.arima R function to search over models using AIC and decide on an ARMA(3,1)

Estimation for ARMA Processes with Stable Noise. Matt Calder & Richard A. Davis Colorado State University

Supplementary Materials for Evolutionary Multiobjective Optimization Based Multimodal Optimization: Fitness Landscape Approximation and Peak Detection

Web-based supplementary materials for Bayesian Quantile Regression for Ordinal Longitudinal Data

Ημερίδα διάχυσης αποτελεσμάτων έργου Ιωάννινα, 14/10/2015

Bayesian modeling of inseparable space-time variation in disease risk

Optimization, PSO) DE [1, 2, 3, 4] PSO [5, 6, 7, 8, 9, 10, 11] (P)


Ηλεκτρονικοί Υπολογιστές IV

Control Theory & Applications PID (, )

FX10 SIMD SIMD. [3] Dekker [4] IEEE754. a.lo. (SpMV Sparse matrix and vector product) IEEE754 IEEE754 [5] Double-Double Knuth FMA FMA FX10 FMA SIMD

[4] 1.2 [5] Bayesian Approach min-max min-max [6] UCB(Upper Confidence Bound ) UCT [7] [1] ( ) Amazons[8] Lines of Action(LOA)[4] Winands [4] 1

ER-Tree (Extended R*-Tree)


ΣΧΟΛΗ ΕΦΑΡΜΟΣΜΕΝΩΝ ΜΑΘΗΜΑΤΙΚΩΝ ΚΑΙ ΦΥΣΙΚΩΝ ΕΠΙΣΤΗΜΩΝ

Bundle Adjustment for 3-D Reconstruction: Implementation and Evaluation

Προσομοίωση BP με το Bizagi Modeler

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review

Meta-Learning and Universality

MOTORCAR INSURANCE I

MIA MONTE CARLO ΜΕΛΕΤΗ ΤΩΝ ΕΚΤΙΜΗΤΩΝ RIDGE ΚΑΙ ΕΛΑΧΙΣΤΩΝ ΤΕΤΡΑΓΩΝΩΝ

ΔΙΠΛΩΜΑΤΙΚΕΣ ΕΡΓΑΣΙΕΣ ΠΜΣ «ΠΛΗΡΟΦΟΡΙΚΗ & ΕΠΙΚΟΙΝΩΝΙΕΣ» OSWINDS RESEARCH GROUP

Tutorial on Multinomial Logistic Regression

Introduction to the ML Estimation of ARMA processes

Research on Economics and Management



Additional Results for the Pareto/NBD Model

ΕΙΣΑΓΩΓΗ ΣΤΗ ΣΤΑΤΙΣΤΙΚΗ ΑΝΑΛΥΣΗ

Presentation Structure

Architecture for Visualization Using Teacher Information based on SOM

ΗΜΕΡΟΜΗΝΙΑ: 25/05/2009 TΕΛΙΚΗ ΕΡΓΑΣΙΑ ΝΙΚΗ ΜΟΣΧΟΥ

Metal thin film chip resistor networks

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Improvement of wave height forecast in deep and intermediate waters with the use of stochastic methods

Map Generation of Mobile Robot by Probabilistic Observation Model Considering Occlusion

BCI On Feature Extraction from Multi-Channel Brain Waves Used for Brain Computer Interface

ΕΥΡΕΣΗ ΤΟΥ ΔΙΑΝΥΣΜΑΤΟΣ ΘΕΣΗΣ ΚΙΝΟΥΜΕΝΟΥ ΡΟΜΠΟΤ ΜΕ ΜΟΝΟΦΘΑΛΜΟ ΣΥΣΤΗΜΑ ΟΡΑΣΗΣ

Vol. 38 No Journal of Jiangxi Normal University Natural Science Nov. 2014

ΑΛΓΟΡΙΘΜΟΙ ΕΞΟΡΥΞΗΣ ΠΛΗΡΟΦΟΡΙΑΣ

«ΑΝΑΠΣΤΞΖ ΓΠ ΚΑΗ ΥΩΡΗΚΖ ΑΝΑΛΤΖ ΜΔΣΔΩΡΟΛΟΓΗΚΩΝ ΓΔΓΟΜΔΝΩΝ ΣΟΝ ΔΛΛΑΓΗΚΟ ΥΩΡΟ»

ΔΠΜΣ: ΕΦΑΡΜΟΣΜΕΝΕΣ ΜΑΘΗΜΑΤΙΚΕΣ ΕΠΙΣΤΗΜΕΣ ΡΟΗ: ΣΤΑΤΙΣΤΙΚΗ-ΠΙΘΑΝΟΤΗΤΕΣ ΜΑΘΗΜΑ: ΑΛΓΟΡΙΘΜΟΙ ΕΞΟΡΥΞΗΣ ΠΛΗΡΟΦΟΡΙΑΣ ΤΕΛΙΚΗ ΕΡΓΑΣΙΑ ΔΗΜΗΤΡΑ ΤΑΤΣΙΟΥ

Automatic extraction of bibliography with machine learning

Kenta OKU and Fumio HATTORI

Hydrologic Process in Wetland

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

ΖΩΝΟΠΟΙΗΣΗ ΤΗΣ ΚΑΤΟΛΙΣΘΗΤΙΚΗΣ ΕΠΙΚΙΝΔΥΝΟΤΗΤΑΣ ΣΤΟ ΟΡΟΣ ΠΗΛΙΟ ΜΕ ΤΗ ΣΥΜΒΟΛΗ ΔΕΔΟΜΕΝΩΝ ΣΥΜΒΟΛΟΜΕΤΡΙΑΣ ΜΟΝΙΜΩΝ ΣΚΕΔΑΣΤΩΝ

Ανάλυση σχημάτων βασισμένη σε μεθόδους αναζήτησης ομοιότητας υποακολουθιών (C589)

The Research on Sampling Estimation of Seasonal Index Based on Stratified Random Sampling

ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ ΣΧΟΛΗ ΗΛΕΚΤΡΟΛΟΓΩΝ ΜΗΧΑΝΙΚΩΝ ΚΑΙ ΜΗΧΑΝΙΚΩΝ ΥΠΟΛΟΓΙΣΤΩΝ Μονάδα Συστημάτων Προβλέψεων & Προοπτικής Forecasting System Unit


ΓΙΑΝΝΟΥΛΑ Σ. ΦΛΩΡΟΥ Ι ΑΚΤΟΡΑΣ ΤΟΥ ΤΜΗΜΑΤΟΣ ΕΦΑΡΜΟΣΜΕΝΗΣ ΠΛΗΡΟΦΟΡΙΚΗΣ ΤΟΥ ΠΑΝΕΠΙΣΤΗΜΙΟΥ ΜΑΚΕ ΟΝΙΑΣ ΒΙΟΓΡΑΦΙΚΟ ΣΗΜΕΙΩΜΑ


Monetary Policy Design in the Basic New Keynesian Model

Summary of the model specified

No Item Code Description Series Reference (1) Meritek Series CRA Thick Film Chip Resistor AEC-Q200 Qualified Type

ΔΙΠΛΩΜΑΤΙΚΕΣ ΕΡΓΑΣΙΕΣ ΠΜΣ «ΠΛΗΡΟΦΟΡΙΚΗ & ΕΠΙΚΟΙΝΩΝΙΕς» OSWINDS RESEARCH GROUP

ΧΡΟΝΟΣΕΙΡΕΣ. Διαχείριση Πληροφοριών

: Active Learning 2017/11/12

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

Η Διαδραστική Τηλεδιάσκεψη στο Σύγχρονο Σχολείο: Πλαίσιο Διδακτικού Σχεδιασμού

FORMULAS FOR STATISTICS 1

Smaller. 6.3 to 100 After 1 minute's application of rated voltage at 20 C, leakage current is. not more than 0.03CV or 4 (µa), whichever is greater.

len(observed ) 1 (observed[i] predicted[i]) 2

CorV CVAC. CorV TU317. 1

Κβαντική Επεξεργασία Πληροφορίας


Twitter 6. DEIM Forum 2014 A Twitter,,, Wikipedia, Explicit Semantic Analysis,

3. Επικύρωση πρακτικών προηγούμενων Συνεδριάσεων της E.E. 1) Συνεδρίαση 152/ ) Συνεδρίαση 153/

Solution Series 9. i=1 x i and i=1 x i.

GridFTP-APT: Automatic Parallelism Tuning Mechanism for Data Transfer Protocol GridFTP

Research on model of early2warning of enterprise crisis based on entropy

Figure A.2: MPC and MPCP Age Profiles (estimating ρ, ρ = 2, φ = 0.03)..

[15], [16], [17] [6] [2] [5] Jiang [6] 2.1 [6], [10] Score(x, y) y ( 1) ( 1 ) b e ( 1 ) b e. O(n 2 ) Jiang [6] (word lattice reranking)

Bayesian Discriminant Feature Selection

EE512: Error Control Coding

Can you pick out the tufas?

[2] T.S.G. Peiris and R.O. Thattil, An Alternative Model to Estimate Solar Radiation

Transcript:

DEIM Forum 2018 F3-5 657 8501 1-1 657 8501 1-1 E-mail: yuta@cs25.scitec.kobe-u.ac.jp, eguchi@port.kobe-u.ac.jp, ( ) ( )..,,,.,.,.,,..,.,,, 2..., 1.,., (Autoencoder: AE) [1] (Generative Stochastic Networks: GSN) [2].,,.,.,. [3], GSN.,, 2008, 2. GSN,.,,,.,., 3,.,.,,., A B. A.,,.,,.,,,.,. 2. 2. 1 [4]. GSN. GSN. 2. 1. 1 (Autoencoder: AE) [1],, (Latent representation). AE,, 3, X. 1.

1 Structure of Autoencoder. 2 Markov chain in Denoiding Autoencoder.,,.,. X, Y Y = f θ (X) = ϕ(w X + b), X X = f θ (Y ) = ϕ (W Y + b )., W, W, b, b, θ = (W, b), θ = (W, b )., ϕ, ϕ, (Rectified Linear Unit: ReLU).., L = (X, X ).,, L. [5],.,. (Deep Autoencoder) [1].,,.,,.,,.,,. 2. 1. 2 (Denoising Autoencoder: DAE) [6]., X X. Salt and Pepper. X X. DAE, L = (X, X ). X X,.,., DAE. X t+1 P θ1 (X X t), Xt+1 P θ2 ( X X t+1) (1) (1) X X, 2. θ 1 P θ1 X t DAE, θ 2 P θ2 X t+1. (1) X 0, X0, X 1, X1,...,., P (X)., P θ1 (X X), [2].,., X t Walkback. Walkback DAE DAE [2]. 2. 1. 3 DAE (Generative Stochastic Networks: GSN) [2]., DAE,. GSN H t (2), 3. H t+1 P θ1 (H H t, X t), X t+1 P θ2 (X H t+1) (2) 3 Markov chain in Generative Stochastic Networks. 2 GSN 4., W 1, W 2, b 0, b 1, b 2., W 1, W 2 0 1, 1 2, b i i. GSN Walkback, 4 Walkback T 3. (3) (5). Xt 0 = ϕ(w1 T Ht 1 + b 0) (3) Ht 1 = ϕ(w 1Xt 1 0 + W2 T Ht 2 + b 1) (4) Ht 2 = ϕ(w 2Ht 1 1 + b 2) (5) (3) (5),. GSN DAE., [7].

validation set,. 3. GSN 4 Structure of GSN with multiple layers., GSN. (1) X, X. (2) X GSN, X 0 0 = X. (3) 0,. (4). (5) 3 4 1, Walkback T. (6) X X 0 t (t T ) L, 1 T ΣT t=1l(x, X 0 t ).. (7) 3 6. 2. 2,,.,,. 2. 2. 1, 2 X, Y Y = f(x) ( ). Y ( ), X ( ). X = [x 0, x 1,..., x n 1], W = [w 0, w 1,..., w n 1], b., Y = W X T + b = w 0x 0 + w 1x 1 + + w n 1x n1 + b..,. Y i X i N, Σ N 1 i=0 L(Yi, Y i )., L.,, L 2. (6), γ i., F, A a i,j, A F = Σa 2 i,j. COST = γ 0 N i=0 2. 2. 2 L(Y i, Y i ) + γ 1 W 2 F + γ 2 b 2 F (6),.., K, k i(i K) valdation set( ), training set( ), K. [3]. 3. 1, GSN. 5. K 2, Walkback T 5, Salt and Pepper.,. ( ) N, S = {s i,j}, i, j N, Xt k = {Xi 0 t } (i N), = {X 0 it } (i N)., s i,j i X k t j., 1, 2 N, N 1, N 2 (N > N 1 > N 2), N N 1 W 0,1 b 1, W 1,2, b 0, b 2.,, N 1 N W T 0,1. C, (7). C = 1 T T t=1 K 1 (Xt 0 X 0 t ) B 2 F + γ W k,k+1 2 F (7) k=0, B s i,j = 0 b i,j = 1, s i,j > 0 b i,j = β(> 1), s i = Σ js i,j = 0 b i = 0,.,. A, B a i,j, b i,j, (A B) i,j = a i,j b i,j. 3. 2, [3].,.,, (t 1), t.., t,. 5 Structure of the model based on GSN.

4. 4. 1 4. 1. 1,..,.,,, (7),., ( )., A 1 N N, B 1 N N 1, COST 1 = α A 1 2 F + β B 1 2 F., A 1 2N 2N, B 2 2N N 1, COST 2 = α A 2 2 F +β B 2 2 F. 2,, 1 2., Salt and Pepper.. Salt and Pepper, p, A a i,j., a i,j {0, 1}., Salt and Pepper.,, Salt and Pepper.,., n X i = [x i,0, x i,1,..., x i,n 1]. n 1., n n 1 W. Y i = [y i,0, y i,1,..., y i,n1 1] n 1, Y T = W T X T., X i. b = [b 0, b 1,..., b n1 1], Y T i = W T X T i + b T., X i n, Y = [Y 0, Y 1,..., Y n 1], X = [X 0, X 1,..., X n 1], Y = XW + Bias. Bias n Bias = [b, b,..., b]. n.,,.,.,. 4. 1. 2, (7) C (8). C = 1 T K 1 M( (X 0 0 t X t T ) B 2 F ) + γ M( W k,k+1 2 F ) (8) t=1 k=0, M( ) A N 1 N 2, 1 1 N 1 N 2 Σa 2 i,j M( A 2 F ) = N 1 N 2 A 2 F =.,,.. min-max normalization,, 6., 6, 0.,., Salt and Pepper,.,. 6 Histgram of the transaction amount.,., B = [b 0, b 1,..., b n1 1] X i, Bias = [B 0, B 1,..., B n 1]., Y = XW + Bias. X i.,,. 3. 4. 2 2,,.,.,,,.,,,,., 2,..

4. 2. 1,.,, 5. 1., A B, A B., m, i I, j J, X i,j = [IJ] n = 2m X i,j. W n 1. ( ), Bias = b. Y, Y i,j = X i,jw + Bias.. K, K.. training set n, (9). COST = α M( Y Y 2 F )+β M( W 2 F )+γ M( Bias 2 F ) (9), Y, Y, W, Bias, 2 3. 5.,. 2. 5, 3.,. 5. 1,., 2009 7 1 2012 12 31.,, (Quoter), (Agressor),,,., 153, 162,075., Sell Buy 2. A B, A B.,, 153 153., S, i j S i,j.,, min-max normalization, S i,j [0, 1]. 7. d e, N, d = e/(n N). 5%,, 2011 9.,, S 15%., (8) B. s i,j = 0 b i,j = 1, s i,j > 0 b i,j = β(> 1),, s i,j = 0., B. 7 Link density in data set.,.,.,. GSN,.,,,,. GSN,. (lender-focused), (borrower-focused)..,,. R, R i,j., 8, 9. 8 Average of monthly interest rates. 9 Relative frequency distribution of interest rates.

5. 2.,. 5. 2. 1 τ2 X = [x 0, x 1,..., x n 1], Y = [y 0, y 1,..., y n 1]. (x i, y i), (x j, y j)., nc 2 = n(n 1)/2. (x i x j)(y i y j) > 0 P s 1, (x i x j)(y i y j) < 0 P r 1., X U X, Y U Y, X Y U X U Y. (10). τ [ 1, 1], τ. τ = P s P r Ps + P r + U X P s + P r + U Y (10) 5. 2. 2. y i, y i, N. (Mean Squared Error: MSE), (Root Mean Squared Percentage Error: RMSPE), (Mean Absolute Percentage Error: MAPE), (11), (12), (13). MSE(Y, Y ) = 1 N 1 (y i y i) 2 (11) N i=0 RMSP E(Y, Y ) = 100 1 N MAP E(Y, Y ) = 100 N N 1 i=0 N 1 i=0 ( ) yi y i 2 (12) y i yi y i (13) y i 5. 2. 3, 2.,., 15%,., 15%.,, 15%, 15%. 1,. 5. 2. 4, (9) α, β, γ. t, 5. validation set MSE 0.01%.,, MSE. MSE, MSE 5%.,,. 2, MSE,,., validation set( ),.,. t, (t + 1)., 4,., (t + 1) MSE, RMSPE, MAPE. (1) : t 85%. : (t + 1) 85%. (2) : t. : t (t + 1) 85%. (3) : t. : t (t + 1) 85%. (4) : 0 1 (153 153). :. 5. 3,,.,, (W 0,1, W 1,2, b 0, b 1, b 2).,,., (with pretraining) (without pretraining)., 2009 7., K = 2, (100, 50), Walkback T = 5, β = 40, γ = 10 3, 100, 0.3, σ 2 = 0.01., 10, 11.,., 3., [3].,.,.

.,,.,..,, 85% 306 ( 153 + 153 ), 100 ( 50 + 50 ).,. 10 Reconstruction performance in time-series plots. 12 MSE in time-series plots. 11 Link prediction performance in time-series plots. 13 RMSPE in time-series plots. 1 Average of performance. Reconstruction Link prediction lender-focused(pretraining) 0.608 ± 0.026 0.426 ± 0.047 lender-focused 0.580 ± 0.028 0.329 ± 0.045 borrower-focused(pretraining) 0.612 ± 0.032 0.422 ± 0.046 borrower-focused 0.562 ± 0.035 0.340 ± 0.045 previous model(pretraining) 0.433 ± 0.033 0.345 ± 0.043 previous model 0.396 ± 0.065 0.308 ± 0.066, (9) α, β, γ. α {1, 10, 100, 500, 1000, 5000, 10000}, β {1, 10, 100, 1000}, γ {1}., α + β + γ = 1 1., 2009 8, 2009 8 R. validation set MSE 0.01%., 0.01%,., MSE 5%., COST = MSE, α = 500, β = 100, γ = 1.. MSE, RMSPE, MAPE, 12, 13, 14., 2.,. 85%. 14 MAPE in time-series plots. 2 Average of each evaluation. MSE RMSPE MAPE 85% of original 0.1159 77.2 52.97 Deep representation 0.1120 70.2 51.78 Reconstructed 0.1170 72.2 53.02 Random 0.1782 161.0 112.37 5. 4,., 2010 8, 2012 3, 4..,. 15 Frobenius norm of transaction amount for every n-month.

, t (t n) D t D t n, D t D t n F. n = 1, 2, 3, 4, 15., 2010 2 8, 2012 12. 2010 8 (t n) 8..,.,. 16. 16 Change rate of average in transaction amount. 2012 3. 2011 11, 2012 6,.,,.,. 2012 1, 2011 12 2012 1.,., 8 12 1 1.0, 3., 2011 12 2012 1., 4 3. (1) 1 : 2009/07 2010/06 (2) 2 : 2010/07 2011/12 (3) 3 : 2012/01 2012/07 (4) 4 : 2012/08 2012/12 3 Average of performance. MSE RMSPE MAPE First term 0.019 23.5 19.26 Second term 0.138 42.3 32.43 Third term 0.235 119.9 90.58 Fourth term 0.049 203.7 138.66, 1, MAPE, ±20%., 4 MSE,., RMSPE MAPE., (9). 6., GSN., 3,, 2.,,.,.,,,.,,,.,.. (B) 15H02703. [1] Geoffrey Hinton and Ruslan Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, Vol. 313, No. 5786, pp. 504 507, 2006. [2] Yoshua Bengio, Li Yao, Guillaume Alain, and Pascal Vincent. Generalized denoising auto-encoders as generative models. CoRR, Vol. abs/1305.6663,, 2013. [3],,,.. 18 (SIG-FIN), pp. pp.120 127, 2017. [4] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. The MIT Press, 2016. [5] Quoc V. Le, Jiquan Ngiam, Adam Coates, Abhik Lahiri, Bobby Prochnow, and Andrew Y. Ng. On optimization methods for deep learning. In Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML 11, pp. 265 272, USA, 2011. Omnipress. [6] Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, ICML 08, pp. 1096 1103, New York, NY, USA, 2008. ACM. [7] Yoshua Bengio, Nicholas Léonard, and Aaron C. Courville. Estimating or propagating gradients through stochastic neurons for conditional computation. CoRR, Vol. abs/1308.3432,, 2013.