Σχετικά έγγραφα

On Represenations for Concept Operations of Convolutional Neural Networks

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

BoVW. (Histogram Encoding) [2], [5], [6] [7], [8], (Fisher Encoding) [3] VLAD [9] Super Vector [10] Locality Constrained [11], [12], [13]

Μεταπτυχιακή Διπλωματική Εργασία. Βαθιά Αραιή Κωδικοποίηση (Deep Sparse Coding)

Area Location and Recognition of Video Text Based on Depth Learning Method

ΤΕΙ ΘΕΣΣΑΛΙΑΣ. Αναγνώριση προσώπου με επιλογή των κατάλληλων κυρίων συνιστωσών. ΤΜΗΜΑ ΜΗΧΑΝΙΚΩΝ ΠΛΗΡΟΦΟΡΙΚΗΣ Τ.Ε ΚΑΒΒΑΔΙΑ ΑΛΕΞΑΝΔΡΟΥ.

Identifying Scenes with the Same Person in Video Content on the Basis of Scene Continuity and Face Similarity Measurement

Wireless capsule endoscopy video classification using an unsupervised learning approach

[1] DNA ATM [2] c 2013 Information Processing Society of Japan. Gait motion descriptors. Osaka University 2. Drexel University a)

HOSVD. Higher Order Data Classification Method with Autocorrelation Matrix Correcting on HOSVD. Junichi MORIGAKI and Kaoru KATAYAMA

{takasu, Conditional Random Field

Faruqui [7] WordNet [15] FrameNet [2] PPDB [8]

ER-Tree (Extended R*-Tree)

Σέργιος Θεοδωρίδης Κωνσταντίνος Κουτρούμπας. Version 2

Anomaly Detection with Neighborhood Preservation Principle

1530 ( ) 2014,54(12),, E (, 1, X ) [4],,, α, T α, β,, T β, c, P(T β 1 T α,α, β,c) 1 1,,X X F, X E F X E X F X F E X E 1 [1-2] , 2 : X X 1 X 2 ;

Can you pick out the tufas?

Bayesian modeling of inseparable space-time variation in disease risk

SVM. Research on ERPs feature extraction and classification

Πτυχιακή Εργασι α «Εκτι μήσή τής ποιο τήτας εικο νων με τήν χρή σή τεχνήτων νευρωνικων δικτυ ων»

Πανεπιστήµιο Πειραιώς Τµήµα Πληροφορικής

HMY 795: Αναγνώριση Προτύπων

: Active Learning 2017/11/12

Reading Order Detection for Text Layout Excluded by Image

No. 7 Modular Machine Tool & Automatic Manufacturing Technique. Jul TH166 TG659 A

476,,. : 4. 7, MML. 4 6,.,. : ; Wishart ; MML Wishart ; CEM 2 ; ;,. 2. EM 2.1 Y = Y 1,, Y d T d, y = y 1,, y d T Y. k : p(y θ) = k α m p(y θ m ), (2.1

(Υπογραϕή) (Υπογραϕή) (Υπογραϕή)

Vol. 31,No JOURNAL OF CHINA UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb

1 n-gram n-gram n-gram [11], [15] n-best [16] n-gram. n-gram. 1,a) Graham Neubig 1,b) Sakriani Sakti 1,c) 1,d) 1,e)

BCI On Feature Extraction from Multi-Channel Brain Waves Used for Brain Computer Interface

2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems

Research on Economics and Management

Bayesian Discriminant Feature Selection

Kernel Methods and their Application for Image Understanding

Εφαρμογή των CNN για Aναγνώριση και Eντοπισμό Aντικειμένων & Yλοποίηση στο Eνσωματωμένο Σύστημα Jetson-TK1

YOU Wen-jie 1 2 JI Guo-li 1 YUAN Ming-shun 2

ΕΚΤΙΜΗΣΗ ΤΟΥ ΚΟΣΤΟΥΣ ΤΩΝ ΟΔΙΚΩΝ ΑΤΥΧΗΜΑΤΩΝ ΚΑΙ ΔΙΕΡΕΥΝΗΣΗ ΤΩΝ ΠΑΡΑΓΟΝΤΩΝ ΕΠΙΡΡΟΗΣ ΤΟΥ

ΕΥΡΕΣΗ ΤΟΥ ΔΙΑΝΥΣΜΑΤΟΣ ΘΕΣΗΣ ΚΙΝΟΥΜΕΝΟΥ ΡΟΜΠΟΤ ΜΕ ΜΟΝΟΦΘΑΛΜΟ ΣΥΣΤΗΜΑ ΟΡΑΣΗΣ

ΣΤΟΧΑΣΤΙΚΕΣ ΔΙΕΡΓΑΣΙΕΣ & ΒΕΛΤΙΣΤΟΠΟΙΗΣΗ Αίθουσα Νέα Κτίρια ΣΗΜΜΥ Ε.Μ.Π.

Random Forests Leo. Hitoshi Habe 1

ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ ΣΧΟΛΗ ΗΛΕΚΤΡΟΛΟΓΩΝ ΜΗΧΑΝΙΚΩΝ ΚΑΙ ΜΗΧΑΝΙΚΩΝ ΥΠΟΛΟΓΙΣΤΩΝ

Gaze Estimation from Low Resolution Images Insensitive to Segmentation Error

substructure similarity search using features in graph databases

(pattern recognition) (symbol processing) (content) (raw data) - 1 -

Meta-Learning and Universality

A Convolutional Neural Network Approach for Objective Video Quality Assessment

CSJ. Speaker clustering based on non-negative matrix factorization using i-vector-based speaker similarity

ΕΡΓΑΣΤΗΡΙΑΚΗ ΑΣΚΗΣΗ 1 ΑΠΕΙΚΟΝΙΣΤΙΚΗ ΜΙΚΡΟΣΚΟΠΙΑ

Research on model of early2warning of enterprise crisis based on entropy

Method to Distinguish between Handwritten and Machine-printed Characters Inspired by Human Vision System

Ανάλυση Προτιμήσεων για τη Χρήση Συστήματος Κοινόχρηστων Ποδηλάτων στην Αθήνα

(Synesthesia) (B) 22-25

Spatiotemporal footprint of the WNV in Greece : Analysis & Risk Αssessment in a GIS Εnvironment

Nov Journal of Zhengzhou University Engineering Science Vol. 36 No FCM. A doi /j. issn

ΣΥΝΕΛΙΚΤΙΚΑ ΝΕΥΡΩΝΙΚΑ ΔΙΚΤΥΑ

Ειδικές Επιστηµονικές Εργασίες

[2] T.S.G. Peiris and R.O. Thattil, An Alternative Model to Estimate Solar Radiation

Νευρωνικά ίκτυα και Εξελικτικός

CorV CVAC. CorV TU317. 1

[5] F 16.1% MFCC NMF D-CASE 17 [5] NMF NMF 3. [5] 1 NMF Deep Neural Network(DNN) FUSION 3.1 NMF NMF [12] S W H 1 Fig. 1 Our aoustic event detect

GPU. CUDA GPU GeForce GTX 580 GPU 2.67GHz Intel Core 2 Duo CPU E7300 CUDA. Parallelizing the Number Partitioning Problem for GPUs

Fourier transform, STFT 5. Continuous wavelet transform, CWT STFT STFT STFT STFT [1] CWT CWT CWT STFT [2 5] CWT STFT STFT CWT CWT. Griffin [8] CWT CWT

Chapter 1 Introduction to Observational Studies Part 2 Cross-Sectional Selection Bias Adjustment

Μοντελοποίηση της Οπτικής Προσοχής Visual Attention Modeling

Μηχανική Μάθηση Βαθιά Μάθηση και Εφαρμογές. Κωνσταντίνος Διαμαντάρας Τμήμα Μηχανικών Πληροφορικής ΑΤΕΙ Θεσσαλονίκης

SocialDict. A reading support tool with prediction capability and its extension to readability measurement

Benchmark Analysis of Representative Deep Neural Network Architectures

Ανάκτηση Εικόνας βάσει Υφής με χρήση Eye Tracker

Τοποθέτηση τοπωνυµίων και άλλων στοιχείων ονοµατολογίας στους χάρτες

Toward a SPARQL Query Execution Mechanism using Dynamic Mapping Adaptation -A Preliminary Report- Takuya Adachi 1 Naoki Fukuta 2.

ΑΡΧΙΜΗ ΗΣ - ΕΝΙΣΧΥΣΗ ΕΡΕΥΝΗΤΙΚΩΝ ΟΜΑ ΩΝ ΣΤΑ ΤΕΙ. Υποέργο: «Ανάκτηση και προστασία πνευµατικών δικαιωµάτων σε δεδοµένα

High order interpolation function for surface contact problem

y = f(x)+ffl x 2.2 x 2X f(x) x x p T (x) = 1 Z T exp( f(x)=t ) (2) x 1 exp Z T Z T = X x2x exp( f(x)=t ) (3) Z T T > 0 T 0 x p T (x) x f(x) (MAP = Max

D. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, 60(2):91-110, 2004.

Speeding up the Detection of Scale-Space Extrema in SIFT Based on the Complex First Order System

Δρ. Βασίλειος Γ. Καμπουρλάζος Δρ. Ανέστης Γ. Χατζημιχαηλίδης

ΕΥΘΑΛΙΑ ΚΑΜΠΟΥΡΟΠΟΥΛΟΥ

«ΑΝΑΠΣΤΞΖ ΓΠ ΚΑΗ ΥΩΡΗΚΖ ΑΝΑΛΤΖ ΜΔΣΔΩΡΟΛΟΓΗΚΩΝ ΓΔΓΟΜΔΝΩΝ ΣΟΝ ΔΛΛΑΓΗΚΟ ΥΩΡΟ»

Detection and Recognition of Traffic Signal Using Machine Learning

552 Lee (2006),,, BIC,. : ; ; ;. 2., Poisson (Zero-Inflated Poisson Distribution), ZIP. Y ZIP(φ, λ), φ + (1 φ) exp( λ), y = 0; P {Y = y} = (1 φ) exp(

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

ΝΕΥΡΩΝΙΚΑ ΔΙΚΤΥΑ ΚΑΙ ΕΦΑΡΜΟΓΕΣ ΑΥΤΩΝ ΣΠΟΥΔΑΣΤΕΣ : ΖΩΡΗΣ ΝΙΚΟΛΑΟΣ ΚΑΤΣΙΝΟΥΛΑΣ ΝΙΚΟΛΑΟΣ

Prepolarized Microphones-Free Field

ΔΙΑΧΕΙΡΙΣΗ ΠΕΡΙΕΧΟΜΕΝΟΥ ΠΑΓΚΟΣΜΙΟΥ ΙΣΤΟΥ ΚΑΙ ΓΛΩΣΣΙΚΑ ΕΡΓΑΛΕΙΑ. Data Mining - Classification

HCI - Human Computer Interaction Σχεδιασμός Διεπαφής. ΓΤΠ 61 Βαµβακάρης Μιχάλης 09/12/07

[4] 1.2 [5] Bayesian Approach min-max min-max [6] UCB(Upper Confidence Bound ) UCT [7] [1] ( ) Amazons[8] Lines of Action(LOA)[4] Winands [4] 1

GPU DD Double-Double 3 4 BLAS Basic Linear Algebra Subprograms [3] 2

Μεταδιδακτορικός ερευνητής, Εργάστηκα στο ερευνητικό ινστιτούτο Wellcome Trust Centre for Human Genetics και στο Τμήμα Στατιστικής του Πανε

Bundle Adjustment for 3-D Reconstruction: Implementation and Evaluation

Αριστοτέλειο Πανεπιστήμιο Θεσσαλονίκης Τμήμα Μαθηματικών Π.Μ.Σ. Θεωρητικής Πληροφορικής και Θεωρίας Συστημάτων και Ελέγχου

[2], [3], [8], [20] [4] [6], [18] [1], [11], [19] [13] [10] N SVD PCA N SVD Vasilescu Vasilescu N SVD [14] [17] Y Li [7] Y Li N SVD [12] 2,,,,, 596

Ένα µοντέλο Ισοδύναµης Χωρητικότητας για IEEE Ασύρµατα Δίκτυα. Εµµανουήλ Καφετζάκης

ΠΟΛΥΤΕΧΝΕΙΟ ΚΡΗΤΗΣ ΣΧΟΛΗ ΜΗΧΑΝΙΚΩΝ ΠΕΡΙΒΑΛΛΟΝΤΟΣ

Stabilization of stock price prediction by cross entropy optimization

Automatic extraction of bibliography with machine learning

Optimization, PSO) DE [1, 2, 3, 4] PSO [5, 6, 7, 8, 9, 10, 11] (P)

Ανάκτηση Πληροφορίας

Gemini, FastMap, Applications. Εαρινό Εξάμηνο Τμήμα Μηχανικών Η/Υ και Πληροϕορικής Πολυτεχνική Σχολή, Πανεπιστήμιο Πατρών

HMY 795: Αναγνώριση Προτύπων. Διαλέξεις 15-16

Transcript:

E-mail: nakayama@ci.i.u-tokyo.ac.jp Abstract (C C (FWM C 1 (deep learning [12, 17] (convolutional neural networks, C [25, 30, 19] [5, 21, 23] C [18] C C GPU [5, 21] C C [11] [16] [2] C (FWM FWM FWM 2 (FWM [29] [33] [1] FWM

Layer 0 (Raw image Patch descriptor Convolutions using Fisher weight map Pooling & rectification OUTPUT Layer 4 Full connection (Logistic regression Layer 3 Layer 2 Layer 1 Pooling 1 [15] spatial pyramids [22] pyramid matching FWM FWM FWM PCA (EWM [29] C EWM FWM 3 1 k (L k m k 1 P k P k (L 0 RGB 3 map 1 map 2 map mk ( x, y ( k f( x 1, y ( k f( x, y 1 ( k f( x+ 1, y M x= ( k ( x, Fisher weight map or Eigen weight map w 1 w mk +1 w 2 Layer k Layer k+1 2 y 1 1 ( x, y map 1 ( z 1 map 2 ( z 2 map m k+ 1 ( z mk +1 [17] 3.1 2 f (k (x,y R m k L k (x, y f (k (x,y n n f (k x (k (x,y Rm k n 2 (x,y (P k n +1 (P k n +1 x (k (x,y 2 ( X = x (k (1,1 x(k (2,1 x(k (P k n+1,p k n+1. (1 z z =(X X T w X X EWM FWM m k+1 EWM FWM 1 FWM 2

Eigen weight map (EWM EWM z J E (w J E (w = 1 (z i z T (z i z i=1 { } = w T 1 (X i X(X i X T w i=1 = w T Σ X w, (2 z z i J E (w Σ X w = λw. (3 EWM PCA Fisher weight map (FWM FWM z EWM FWM J F (w EWM Σ W Σ B z Σ W = 1 Σ B = 1 C j j=1 i=1 (z (j i z (j (z (j i z (j T, (4 C j ( z (j z( z (j z T, (5 j=1 C j j, z (j i j i z (j Σ W Σ B tr Σ W = 1 C j (z (j i z (j T (z (j i z (j j=1 i=1 = w T 1 C j (X (j i X (j (X (j i X (j T w j=1 i=1 = w T Σ W w. (6 tr Σ B = 1 C j ( z (j z T ( z (j z j=1 = w T 1 C j ( X (j X( X (j X T w j=1 = w T Σ B w. (7 J F (w = tr Σ B tr Σ = wt Σ B w W w T Σ W w. (8 Σ B w = λσ W w. (9 3.2 tanh [25, 19]Rectified Linear Units (ReLU [27, 21] x R(x = max(0,x. ReLU ReLU Coates [8] ( max(0,x R 2 (x =. (10 max(0, x 3.3 sub-sampling [30] [6, 5, 34] (average pooling (max pooling, L2 [4] L2 3.4 5 5 5 5 3=75

unsupervised : -0.05 0.05 [19]C MIST [24] K-means: Coates [6, 9] bag-of-words [10] zero component analysis (ZCA K-means visual words triangular encoding K-means 3.5 Rand(n, d: n n d K m (n, d: n n d K-means (bag-of-words C(n, m: n n m C EWM, C FWM FWM R, R 2 : Rectified linear units AP [MP,L 2 P ](n, s: [ L2 ] s n n AP [MP,L 2 P ] p : [ L2 ] p p p 2 Rand(5, 200-R-AP (4, 4-C(3, 100-R-AP 2 (1 200 (2 ReLU (3 4 4 (4 3 3 FWM 100 (5 ReLU 3 STL-10 [6]CIFAR-10/100 [20]MIST [24] (6 2 2 4 12 CPU (Xeon 2.7GHz PC GPGPU 4.1 STL-10 [6]CIFAR-10/100 [20], MIST [24] 3 STL-10 96 96 10 10 100 Gens [13] 10 CIFAR-10/100 Tiny images [32] 10 100 32 32 CIFAR-10 5000 CIFAR-100 500 MIST 28 28 0 9 10 6000 1000

60 58 56 54 52 50 48 46 44 42 40 Km(9,256-AP (bag-of-words baseline 2PCA PCAW EWM EWMW FWM X Y 4 X Y (STL-10 K m (9, 256-AP (4, 2-C X (3, 256-Y -AP 2 78 76 74 72 70 68 66 64 Km(5,256-AP (bag-of-words baseline 2PCA PCAW EWM EWMW FWM X R R2 Y 5 X Y (CIFAR-10 K m (5, 256-AP (3, 2-C X (3, 512-Y -AP 2 4.2 2 2 (AP 2, MP 2 (AP 4,5 STL-10, CIFAR-10 EWMFWM PCA PCA EWM PCAW EWMW K m (n, 256-AP 2 PCA EWM FWM STL-10 PCAWEWMW PCA R R2 1 (% (STL-10 Architecture Acc. K m (9, 256-AP 2 48.4 K m (9, 256-MP 2 54.1 K m (9, 256-L 2 P 2 51.4 K m (9, 256-AP (4, 2-C(3, 256-AP 2 56.4 K m (9, 256-MP(4, 2-C(3, 256-L 2 P 2 50.9 K m (9, 256-MP(4, 2-C(3, 256-AP 2 58.4 K m (9, 256-MP(4, 2-C(3, 256-MP 2 44.6 K m (9, 256-MP(4, 2-C(3, 256-R-L 2 P 2 59.6 K m (9, 256-MP(4, 2-C(3, 256-R-AP 2 60.0 K m (9, 256-MP(4, 2-C(3, 256-R 2 -L 2 P 2 61.0 K m (9, 256-MP(4, 2-C(3, 256-R 2 -AP 2 61.2 2 (% (CIFAR-10 Architecture Acc. K m (5, 256-AP 2 72.2 K m (5, 256-MP 2 68.3 K m (5, 256-L 2 P 2 71.9 K m (5, 256-MP(3, 2-C(3, 512-AP 2 70.4 K m (5, 256-AP (3, 2-C(3, 512-L 2 P 2 64.3 K m (5, 256-AP (3, 2-C(3, 512-AP 2 71.0 K m (5, 256-AP (3, 2-C(3, 512-MP 2 66.6 K m (5, 256-AP (3, 2-C(3, 512-R-L 2 P 2 73.8 K m (5, 256-AP (3, 2-C(3, 512-R-AP 2 74.5 K m (5, 256-AP (3, 2-C(3, 512-R 2 -L 2 P 2 76.3 K m (5, 256-AP (3, 2-C(3, 512-R 2 -AP 2 76.6 EWM FWM ReLU R 2 FWM 1,2 K-means STL-10 MPCIFAR-10 AP AP CIFAR-10 3

3 n d (% (CIFAR-10 K m (5,d-AP (3,2-C(n,512-R 2 -AP 2 n d 256 512 1024 3 76.6 (2304 77.1 (4608 78.1 (9216 4 77.3 (4096 78.3 (8192-5 77.2 (6400 77.8 (12800-4.3 4.2 FWM 4,5 ReLU ReLU CIFAR-10 ReLU 4.4 6 (* 4.3 STL-10MIST CIFAR-100 C CIFAR-10, CIFAR-100 CIFAR-100 CIFAR-100 Maxout [14] Stochastic pooling [34] C CIFAR- 10 CIFAR-10 5000 CIFAR-100 500 STL-10 FWM CIFAR-100 5 ReLU CPU fine-tuning JST CREST [1] P. Belhumeur, J. Hespanha, and D. Kriegman. Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Trans. PAMI, 19(7:711 720, 1997. [2] Y. Bengio. Practical recommendations for gradient-based training of deep architectures. eural etworks: Tricks of the Trade, 2012. [3] L. Bo, X. Ren, and D. Fox. Unsupervised feature learning for RGB-D based object recognition. In Proc. ISER, 2012. [4] Y. L. Boureau, J. Ponce, and Y. LeCun. A theoretical analysis of feature pooling in visual recognition. In Proc. ICML, 2010. [5] D. Ciresan, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image classification. In Proc. IEEE CVPR, 2012.

4 (% (STL-10 Architecture Acc. (1 K m (9, 256-MP 2 54.1 (2 K m (9, 256-MP(4, 2-C(3, 256-R 2 -AP 2 61.2 (3 K m (9, 256-MP(4, 2-C(3, 256-AP (4, 2-C(3, 256-R 2 -AP 2 64.0 (4 K m (9, 256-MP(4, 2-C(3, 256-R-AP (4, 2-C(3, 256-R 2 -AP 2 63.3 (5 K m (9, 256-MP(4, 2-C(3, 256-R 2 -AP (4, 2-C(3, 256-R 2 -AP 2 64.2 (6 K m (9, 256-MP(4, 2-C(3, 256-AP (4, 2-C(3, 256-AP (4, 2-C(3, 256-R 2 -AP 2 65.7 (2+(3+(6 66.0 5 (% (CIFAR-10 Architecture Acc. (1 K m (5, 256-AP 2 72.2 (2 K m (5, 256-C(3, 512-R 2 -AP 2 76.3 (3 K m (5, 256-C(3, 256-AP (3, 2-C(3, 512-R 2 -AP 2 77.0 (4 K m (5, 256-C(3, 256-R-AP (3, 2-C(3, 512-R 2 -AP 2 76.4 (5 K m (5, 256-C(3, 256-R 2 -AP (3, 2-C(3, 512-R 2 -AP 2 76.4 (6 K m (5, 256-C(3, 256-AP (3, 2-C(3, 256-AP (3, 2-C(3, 512-R 2 -AP 2 76.4 (2+(3+(6 79.1 [6] A. Coates, H. Lee, and A. Y. g. An analysis of singlelayer networks in unsupervised feature learning. In Proc. AISTATS, 2011. [7] A. Coates and A. g. Selecting receptive fields in deep networks. In Proc. IPS, 2011. [8] A. Coates and A. g. The importance of encoding versus training with sparse coding and vector quantization. In Proc. ICML, 2011. [9] A. Coates and A. g. Learning feature representations with K-means. eural etworks: Tricks of the Trade, 2012. [10] G. Csurka, C. R. Dance, L. Fan, J. Willamowski, and C. Bray. Visual categorization with bags of keypoints. In Proc. ECCV Workshop on Statistical Learning in Computer Vision, 2004. [11] D. Erhan, Y. Bengio, A. Courville, P. A. Manzagol, and P. Vincent. Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research, 11:625 660, 2010. [12] K. Fukushima. eocognitron for handwritten digit recognition. eurocomputing, 51:161 180, 2003. [13] R. Gens and P. Domingos. Discriminative learning of sumproduct networks. In Proc. IPS, 2012. [14] I. J. Goodfellow, D. Warde-farley, M. Mirza, A. Courville, and Y. Bengio. Maxout networks. In Proc. ICML, 2013. [15] T. Harada, Y. Ushiku, Y. Yamashita, and Y. Kuniyoshi. Discriminative spatial pyramid. In Proc. IEEE CVPR, pages 1617 1624, 2011. [16] G. Hinton,. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. In arxiv preprint, 2012. [17] G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313:504 507, 2006. [18] D. H. Hubel and T.. Wiesel. Receptive fields of single neurones in the cat s striate cortex. The Journal of physiology, 148:574 591, 1959. [19] K. Jarrett, K. Kavukcuoglu, M. A. Ranzato, and Y. Lecun. What is the best multi-stage architecture for object recognition? In Proc. IEEE ICCV, 2009. [20] A. Krizhevsky. Learning multiple layers of features from tiny images. Master s thesis, Toronto University, 2009. [21] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imageet classification with deep convolutional neural networks. In Proc. IPS, 2012. [22] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. IEEE CVPR, volume 2, 2006. [23] Q. V. Le, M. A. Ranzato, R. Monga, M. Devin, K. Chen, G. S. Corrado, and A. Y. g. Building high-level features using large scale unsupervised learning. In Proc. ICML, 2012. [24] Y. LeCun. The MIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/. [25] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradientbased learning applied to document recognition. Proc. of the IEEE, 1998. [26] M. Lin, Q. Chen, and S. Yan. etwork in network. In Proc. ICLR, 2014. [27] V. air and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In Proc. ICML, 2010. [28] M. A. Ranzato, C. Poultney, S. Chopra, and Y. LeCun. Efficient learning of sparse representations with an energybased model. In Proc. IPS, 2006. [29] Y. Shinohara and. Otsu. Facial expression recognition using Fisher weight maps. In IEEE FG, 2004. [30] P. Simard, D. Steinkraus, and J. Platt. Best practices for convolutional neural networks applied to visual document analysis. In Proc. ICDAR, 2003. [31]. Srivastava and R. Salakhutdinov. Discriminative transfer learning with tree-based priors. In Proc. IPS, number Icml, 2013. [32] A. Torralba, R. Fergus, and W. T. Freeman. 80 million tiny images: A large dataset for nonparametric object and scene recognition. IEEE Trans. PAMI, 30(11:1958 70, ov. 2008. [33] M. Turk and A. Pentland. Face recognition using eigenfaces. In Proc. IEEE CVPR, 1991. [34] M. D. Zeiler and R. Fergus. Stochastic pooling for regularization of deep convolutional neural networks. In arxiv preprint, 2013.

6 STL-10, CIFAR-10/100, MIST (%. STL-10 1-layer Sparse Coding [8] 59.0 3-layer Learned Receptive Field [7] 60.1 Discriminative Sum-Product etwork [13] 62.3 Hierarchical Matching Pursuit [3] 64.5 K m (9, 256-MP(4, 2-C(3, 256-AP (4, 2-C(3, 256-AP (4, 2-C(3, 256-R 2 -AP 2 65.7 K m (9, 1024-MP(4, 2-C(3, 512-AP (4, 2-C(3, 256-AP (4, 2-C(3, 256-R 2 -AP 2 66.4 K m (9, 1024-MP(4, 2-C(3, 512-AP (4, 2-C(3, 256-AP (4, 2-C(3, 256-R 2 -AP 2 (* 66.9 CIFAR-10 3-Layer Learned Receptive Field [7] 82.0 C [16] 83.4 Discriminative Sum-Product etwork [13] 84.0 C (1 locally connected layer [16] 84.4 C + Stochastic Pooling [34] 84.9 C + Maxout [14] 88.3 etwork in etwork [26] 89.6 K m (5, 1024-C(3, 256-AP (3, 2-C(3, 512-R 2 -AP 3 80.4 K m (5, 1024-C(3, 256-AP (3, 2-C(3, 256-AP (3, 2-C(3, 512-R 2 -AP 3 (* 81.9 CIFAR-100 C + Stochastic pooling [34] 57.49 C + Maxout [14] 61.43 C + Tree-based prior [31] 63.15 etwork in etwork [26] 64.32 K m (5, 6400-C(1, 1000-AP (4, 2-C(3, 1000-AP (3, 2-C(3, 1000-R 2 -AP 3 60.80 K m (5, 6400-C(1, 1000-AP (4, 2-C(3, 1000-AP (3, 2-C(3, 1000-R 2 -AP 3 (* 62.05 MIST C (Unsupervised pretraining [28] 99.40 C (Unsupervised pretraining [19] 99.47 C + Stochastic Pooling [34] 99.53 etwork in etwork [26] 99.53 C + Maxout [14] 99.55 Rand(5, 1024-R-AP (4, 2-C(3, 512-R 2 -AP 4 99.50 Rand(5, 2048-R-C(1, 1024-AP (4, 2-C(3, 512-R 2 -AP 4 99.60