BoVW. (Histogram Encoding) [2], [5], [6] [7], [8], (Fisher Encoding) [3] VLAD [9] Super Vector [10] Locality Constrained [11], [12], [13]

Σχετικά έγγραφα
Speeding up the Detection of Scale-Space Extrema in SIFT Based on the Complex First Order System

Re-Pair n. Re-Pair. Re-Pair. Re-Pair. Re-Pair. (Re-Merge) Re-Merge. Sekine [4, 5, 8] (highly repetitive text) [2] Re-Pair. Blocked-Repair-VF [7]


3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

CSJ. Speaker clustering based on non-negative matrix factorization using i-vector-based speaker similarity

ER-Tree (Extended R*-Tree)

Wireless capsule endoscopy video classification using an unsupervised learning approach

[1] DNA ATM [2] c 2013 Information Processing Society of Japan. Gait motion descriptors. Osaka University 2. Drexel University a)

Detection and Recognition of Traffic Signal Using Machine Learning

Random Forests Leo. Hitoshi Habe 1

Nov Journal of Zhengzhou University Engineering Science Vol. 36 No FCM. A doi /j. issn

Buried Markov Model Pairwise

2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

No. 7 Modular Machine Tool & Automatic Manufacturing Technique. Jul TH166 TG659 A

Adaptive grouping difference variation wolf pack algorithm

Identifying Scenes with the Same Person in Video Content on the Basis of Scene Continuity and Face Similarity Measurement

MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)

Quick algorithm f or computing core attribute

HOSVD. Higher Order Data Classification Method with Autocorrelation Matrix Correcting on HOSVD. Junichi MORIGAKI and Kaoru KATAYAMA

SocialDict. A reading support tool with prediction capability and its extension to readability measurement

Current Status and Future Prospects of Camera-Based Character Recognition and Document Image Analysis

1530 ( ) 2014,54(12),, E (, 1, X ) [4],,, α, T α, β,, T β, c, P(T β 1 T α,α, β,c) 1 1,,X X F, X E F X E X F X F E X E 1 [1-2] , 2 : X X 1 X 2 ;

ΓΙΑΝΝΟΥΛΑ Σ. ΦΛΩΡΟΥ Ι ΑΚΤΟΡΑΣ ΤΟΥ ΤΜΗΜΑΤΟΣ ΕΦΑΡΜΟΣΜΕΝΗΣ ΠΛΗΡΟΦΟΡΙΚΗΣ ΤΟΥ ΠΑΝΕΠΙΣΤΗΜΙΟΥ ΜΑΚΕ ΟΝΙΑΣ ΒΙΟΓΡΑΦΙΚΟ ΣΗΜΕΙΩΜΑ

Area Location and Recognition of Video Text Based on Depth Learning Method

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

ΕΥΡΕΣΗ ΤΟΥ ΔΙΑΝΥΣΜΑΤΟΣ ΘΕΣΗΣ ΚΙΝΟΥΜΕΝΟΥ ΡΟΜΠΟΤ ΜΕ ΜΟΝΟΦΘΑΛΜΟ ΣΥΣΤΗΜΑ ΟΡΑΣΗΣ

Bundle Adjustment for 3-D Reconstruction: Implementation and Evaluation

1 n-gram n-gram n-gram [11], [15] n-best [16] n-gram. n-gram. 1,a) Graham Neubig 1,b) Sakriani Sakti 1,c) 1,d) 1,e)

Anomaly Detection with Neighborhood Preservation Principle

Applying Markov Decision Processes to Role-playing Game


{takasu, Conditional Random Field

[5] F 16.1% MFCC NMF D-CASE 17 [5] NMF NMF 3. [5] 1 NMF Deep Neural Network(DNN) FUSION 3.1 NMF NMF [12] S W H 1 Fig. 1 Our aoustic event detect

Optimization, PSO) DE [1, 2, 3, 4] PSO [5, 6, 7, 8, 9, 10, 11] (P)

(Υπογραϕή) (Υπογραϕή) (Υπογραϕή)

[15], [16], [17] [6] [2] [5] Jiang [6] 2.1 [6], [10] Score(x, y) y ( 1) ( 1 ) b e ( 1 ) b e. O(n 2 ) Jiang [6] (word lattice reranking)

EM Baum-Welch. Step by Step the Baum-Welch Algorithm and its Application 2. HMM Baum-Welch. Baum-Welch. Baum-Welch Baum-Welch.

Schedulability Analysis Algorithm for Timing Constraint Workflow Models

D. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, 60(2):91-110, 2004.

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

Stabilization of stock price prediction by cross entropy optimization

ΨΗΦΙΑΚΗ ΕΠΕΞΕΡΓΑΣΙΑ ΕΙΚΟΝΑΣ

Μεταπτυχιακή Διπλωματική Εργασία. Βαθιά Αραιή Κωδικοποίηση (Deep Sparse Coding)

CorV CVAC. CorV TU317. 1


Reading Order Detection for Text Layout Excluded by Image

GPGPU. Grover. On Large Scale Simulation of Grover s Algorithm by Using GPGPU

Automatic extraction of bibliography with machine learning

[4] 1.2 [5] Bayesian Approach min-max min-max [6] UCB(Upper Confidence Bound ) UCT [7] [1] ( ) Amazons[8] Lines of Action(LOA)[4] Winands [4] 1

Kernel Methods and their Application for Image Understanding

Ανάκτηση πολυμεσικού περιεχομένου

ΤΕΙ ΘΕΣΣΑΛΙΑΣ. Αναγνώριση προσώπου με επιλογή των κατάλληλων κυρίων συνιστωσών. ΤΜΗΜΑ ΜΗΧΑΝΙΚΩΝ ΠΛΗΡΟΦΟΡΙΚΗΣ Τ.Ε ΚΑΒΒΑΔΙΑ ΑΛΕΞΑΝΔΡΟΥ.

Japanese Fuzzy String Matching in Cooking Recipes

Comparison of Discriminant Analysis in Ear Recognition

ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ ΣΧΟΛΗ ΗΛΕΚΤΡΟΛΟΓΩΝ ΜΗΧΑΝΙΚΩΝ ΚΑΙ ΜΗΧΑΝΙΚΩΝ ΥΠΟΛΟΓΙΣΤΩΝ

Probabilistic Approach to Robust Optimization

Elements of Information Theory

A Method for Creating Shortcut Links by Considering Popularity of Contents in Structured P2P Networks

Gemini, FastMap, Applications. Εαρινό Εξάμηνο Τμήμα Μηχανικών Η/Υ και Πληροϕορικής Πολυτεχνική Σχολή, Πανεπιστήμιο Πατρών

476,,. : 4. 7, MML. 4 6,.,. : ; Wishart ; MML Wishart ; CEM 2 ; ;,. 2. EM 2.1 Y = Y 1,, Y d T d, y = y 1,, y d T Y. k : p(y θ) = k α m p(y θ m ), (2.1


Bayesian Discriminant Feature Selection

ΣΤΟΙΧΕΙΑ ΠΡΟΤΕΙΝΟΜΕΝΟΥ ΕΞΩΤΕΡΙΚΟΥ ΕΜΠΕΙΡΟΓΝΩΜΟΝΟΣ Προσωπικά Στοιχεία:

ΣΥΓΚΡΙΤΙΚΗ ΜΕΛΕΤΗ ΑΛΓΟΡΙΘΜΩΝ ΕΞΑΓΩΓΗΣ ΧΑΡΑΚΤΗΡΙΣΤΙΚΩΝ

Yoshifumi Moriyama 1,a) Ichiro Iimura 2,b) Tomotsugu Ohno 1,c) Shigeru Nakayama 3,d)

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

HMY 795: Αναγνώριση Προτύπων

n 1 n 3 choice node (shelf) choice node (rough group) choice node (representative candidate)

Other Test Constructions: Likelihood Ratio & Bayes Tests

Web-based supplementary materials for Bayesian Quantile Regression for Ordinal Longitudinal Data

ΣΙΣΛΟ ΓΙΑΣΡΙΒΗ ΔΞΑΓΧΓΗ ΥΑΡΑΚΣΗΡΙΣΙΚΧΝ ΔΙΚΟΝΟΠΛΑΙΙΧΝ ΑΠΟ ΑΚΟΛΟΤΘΙΔ ΒΙΝΣΔΟ ΜΔ ΥΡΗΗ ΟΜΑΓΟΠΟΙΗΗ ΠΟΛΛΑΠΛΧΝ ΟΦΔΧΝ ΜΔΣΑΠΣΤΥΙΑΚΗ ΔΡΓΑΙΑ ΔΞΔΙΓΙΚΔΤΗ

From Secure e-computing to Trusted u-computing. Dimitris Gritzalis

HIV HIV HIV HIV AIDS 3 :.1 /-,**1 +332

Research on Economics and Management

ΔΙΠΛΩΜΑΤΙΚΕΣ ΕΡΓΑΣΙΕΣ

Toward a SPARQL Query Execution Mechanism using Dynamic Mapping Adaptation -A Preliminary Report- Takuya Adachi 1 Naoki Fukuta 2.

Ανάκτηση Εικόνας βάσει Υφής με χρήση Eye Tracker

Research on model of early2warning of enterprise crisis based on entropy

Gaussian Processes Classification Combined with Semi-supervised Kernels

Control Theory & Applications PID (, )

ΔΙΠΛΩΜΑΤΙΚΕΣ ΕΡΓΑΣΙΕΣ ΠΜΣ «ΠΛΗΡΟΦΟΡΙΚΗ & ΕΠΙΚΟΙΝΩΝΙΕΣ» OSWINDS RESEARCH GROUP

Ευφυές Σύστημα Ανάλυσης Εικόνων Μικροσκοπίου για την Ανίχνευση Παθολογικών Κυττάρων σε Εικόνες Τεστ ΠΑΠ

Math 6 SL Probability Distributions Practice Test Mark Scheme

FX10 SIMD SIMD. [3] Dekker [4] IEEE754. a.lo. (SpMV Sparse matrix and vector product) IEEE754 IEEE754 [5] Double-Double Knuth FMA FMA FX10 FMA SIMD

Indexing Methods for Encrypted Vector Databases

substructure similarity search using features in graph databases

Ειδικές Επιστηµονικές Εργασίες

Spring 2010: Lecture 3. Ashutosh Saxena. Ashutosh Saxena

40 3 Journal of South China University of Technology Vol. 40 No Natural Science Edition March

D. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, 60(2):91-110, 2004.

ES440/ES911: CFD. Chapter 5. Solution of Linear Equation Systems

Ανάλυση σχημάτων βασισμένη σε μεθόδους αναζήτησης ομοιότητας υποακολουθιών (C589)

ΠΑΡΑΔΟΤΕΟ 3.1 : Έκθεση καταγραφής χρήσεων γης

Αυτόματη Ανακατασκευή Θραυσμένων Αντικειμένων

Newman Modularity Newman [4], [5] Newman Q Q Q greedy algorithm[6] Newman Newman Q 1 Tabu Search[7] Newman Newman Newman Q Newman 1 2 Newman 3

Mapping Textures on 3D Geometric Model Using Reflectance Image

SVM. Research on ERPs feature extraction and classification

Robust Feature Extraction Method Based on Run-Length Compensation for Degraded Character Recognition

GPU. CUDA GPU GeForce GTX 580 GPU 2.67GHz Intel Core 2 Duo CPU E7300 CUDA. Parallelizing the Number Partitioning Problem for GPUs

Transcript:

1,a) 1 2 1 SIFT Bag-of-Visual-Words Bag-of-Visual-Words 1. BoVW [2] BoVW Dense [3] Interest Point [4] 1 376 8515 1 5 1 School of Science and Technology, Gunma University Tenjin-cho 1 5 1, Kiryu-shi, Gunma, 376 8515 Japan 2 980 8579 6 6 05 Graduate School of Engineering, Tohoku University, 6 6 05, Aramaki Aza Aoba, Aoba-ku, Sendai, 980 8579, Japan a) matsuzawa-tomoki@kato-lab.cs.gunma-u.ac.jp BoVW (Histogram Encoding) [2], [5], [6] [7], [8], (Fisher Encoding) [3] VLAD [9] Super Vector [10] Locality Constrained Linear Encoding [11] [11], [12], [13] (Average Pooling) (Max Pooling) [11], [12], [13] [5], [13] [12] Chatfield [14] [3] [6] [15] BoVW c 2015 Information Processing Society of Japan 1

[16], [17]. BoVW K- Visual Word [18] 2. : BoVW (FV) 2.1 Bag-of-Visual-Words (BoVW) BoVW [2] BoVW. (e.g. SIFT [19]), Visual Word [14]. Visual Word. [12] Visual Word K K Visual Word 2.2 (FV) (FV) θ = [θ 1,..., θ m ] p(x θ) [20] X FV T X := [x 1,..., x T ] FV f(x) := θ1 log p(x θ) EX (( θ1 log p(x θ)) 2 ). θm log p(x θ) EX (( θm log p(x θ)) 2 ). p(x θ) K p(x θ) = T K π k N (x t ; µ k, diag(σ k ) 2 ) π k k π k = 1 µ k R d σ k R d θ = [ π 1,..., π K, µ 1,..., µ K, σ1,..., σk ] (2d + 1)K FV (2d + 1)K K π 1,..., π K K FV 2dK [21], [22], [23] 2dK FV [24] E X (( θi log p(x θ)) 2 ) (a) (responsibility) [25] (b) T FV f euc (X) := [ f µ1 euc(x),..., f µk euc (X), f σ1 euc(x),..., f σk euc (X) ] k = 1,..., K c 2015 Information Processing Society of Japan 2

f µ k euc(x) 1 T πk diag(σ k ) 1 Y k,euc γ k,euc, f σ 1 ( k euc(x) diag(σk ) 2 Y k,euc Y k,euc 1 d 1 ) 2T T γk,euc πk Y k,euc := X µ k 1 T γ k,euc R T k γ k,euc t π k N (x t ; µ k, diag(σ k ) 2 ) k π k N (x t ; µ k, diag(σ k ) 2 ). BoVW FV N (x t ; µ k, diag(σ k ) 2 ) Visual Word FV 3. : BoVW FV [1], [26] x, x R d D(x, x ; A) := (x x ) A(x x ) A ( ) ( ) 3.1 BoVW [18] BoWV K Visual Word V := [v 1,..., v K ] R d K J cb-ho (V ; A) := T min D(x t, v kt ; A) k t {1,...,K} t=1 K Greedy T BoVW Bagof-Visual-Words(HoMahaBoVW) d Visual Word v k A k BoVW Bag-of-Visual-Words(HeMahaBoVW) HeMahaBoVW x k argmin D(x, v k ; A k ). k {1,...,K} Visual Word v k 3.2 FV [18] FV A = UΛU W := Λ 1/2 U x (i.e. D(x, x ; A) = W x W x ) [18] : p ho (X θ) := det(w ) T T K π k N (W x t ; µ k, diag(σ k ) 2 ). FV f ho (X) (HoMahaFV) HoMahaFV f ho (X) := [ f µ1 ho (X),..., f µk ho (X), f σ1 ho (X),..., f σk ho (X) ] E X (( θi log p ho (X θ)) 2 ) FV (2.2 ) f µ k ho (X) 1 T πk diag(σ k ) 1 Y k γ k, f σ k ho (X) 1 2T πk ( diag(σk ) 2 Y k Y k 1 d 1 T ) γk Y k := W X µ k 1 T γ k R T t π k N (W x t ; µ k, diag(σ k ) 2 ) k π k N (W x t ; µ k, diag(σ k ) 2 ). HoMahaFV W HoMahaFV (HeMahaFV) FV W 1,..., W K : c 2015 Information Processing Society of Japan 3

p he (X θ) := T K π k det(w k ) N (W k x t ; µ k, diag(σ k ) 2 ). W k k A k Y k := W k X µ k 1 T p he(x θ) µ k σ k FV f µ k he (X) 1 T πk diag(σ k ) 1 Y kγ k, f σ k he (X) 1 2T πk ( diag(σk ) 2 Y k Y k 1 d 1 T ) γ k T γ k t π k det(w k ) N (W k x t ; µ k, diag(σ k ) 2 ) k π k det(w k ) N (W k x t ; µ k, diag(σ k ) 2 ). 4. HeMahaFV 4.1 HeMahaBoVW HeMahaBoVW K Visual Word A 1,..., A K HeMahaBoVW c A (c) c A (c) 0.05 Visual Word Visual Word K K 2 HeMahaBoVW A (c) W (c) K 2 K 2 K p he (X θ) Greedy EM EM E-step M-step c K µ 1,c,..., µ K,c R d, σ 1,c,..., σ K,c R d, π c R K T x 1,..., x T : L(θ (c) ) := det(w (c) ) T π k,c N (W (c) x t ; µ k,c, diag(σ k,c ) 2 ) T K θ (c) := {µ 1,c,..., µ K,c, σ 1,c,..., σ K,c, π c } EM Algorithm 1 EM Algorithm for HeMahaFV Method Input: Observation x 1,..., x T R d and initial values µ (0) 1,c,..., µ(0) K,c Rd, σ (0) 1,c,..., σ(0) K,c Rd, π c R K 1: for l = 1, 2,... do 2: E step Compute γ (l) t,k := π(l 1) k,c k π(l 1) k,c N (W (c) x t ; µ (l 1) k,c, diag(σ (l 1) k,c ) 2 ) N (W (c)x t ; µ (l 1) k,c, diag(σ(l 1) k,c )2 ). for (t, k) {1,..., T } {1,..., K }; 3: M step Update the parameter values by µ (l) k,c := t γ(l) t,k W (c)x t. (σ (l) k,c )2 := π (l) k,c := 1 T t γ(l) t,k t γ(l) for k = 1,..., K ; 4: end for t k,c )) t,k ((W (c)x t µ (l) k,c ) (W (c)x t µ (l) γ (l) t,k. t γ(l) t,k. 4.2 HeMahaFV HeMahaFV K 2 K 2 5. HeMahaBoVW, HeMahaFV 6 c 2015 Information Processing Society of Japan 4

(a) FMD 2 LSP15 15, 200 30 10 (b) LSP15 1 Categorization Performance. BoVW, FV (HoEucBoVW, HoEucFV) HeMahaBoVW (HeEucBoVW) HeMahaFV θ (HeEucFV) [18] (HoMahaBoVW, HoMahaFV) 5.1 3 Dense SIFT Visual Word BoVW K = 1024 FV K = 256 FV Power L2 [27] HeEucFV HeMahaFV L2 Flickr Material Database(FMD)[28] LSP15[5] One-vs-rest SVM FMD Flickr.com 10 100 5.2 1 FMD LSP15 BoVW FV HeMahaFV 5.3 101 CalTech 101(Cal 101)[29] (Cal 10 Cal 20... Cal 50) (Cal 10 Cal 20... Cal 50) Cal 101 10 20 10 50 Cal 101 Top-N Accuracy 2 N 10 1 2 HeMahaFV FV(EucHoFV) 6. BoVW FV HeMahaBoVW HeMahaFV HeMac 2015 Information Processing Society of Japan 5

habovw HeMahaFV HeMahaFV BoVW 1 (background clutter) Ramazan [22] Fraz [4] Mid-Level BoVW Visual Word [11], [12], [13] Visual Word [1] Kato, T., Takei, W. and Omachi, S.: A Discriminative Metric Learning Algorithm for Face Recognition, IPSJ Transactions on Computer Vision and Applications, Presented at MIRU2013 as Oral Presentation, Vol. 5, pp. 85 89 (2013). [2] Csurka, G., Dance, C., Fan, L., Willamowski, J. and Bray, C.: Visual categorization with bags of keypoints, Workshop on statistical learning in computer vision, ECCV, Vol. 1, p. 22 (2004). [3] Sánchez, J., Perronnin, F., Mensink, T. and Verbeek, J.: Image classification with the Fisher vector: Theory and practice, International journal of computer vision, Vol. 105, No. 3, pp. 222 245 (2013). [4] Fraz, M., Edirisinghe, E. A. and Sarfraz, M. S.: Midlevel-Representation Based Lexicon for Vehicle Make and Model Recognition, Pattern Recognition (ICPR), 2014 22nd International Conference on, IEEE, pp. 393 398 (2014). [5] Lazebnik, S., Schmid, C. and Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, Vol. 2, IEEE, pp. 2169 2178 (2006). [6] Sivic, J. and Zisserman, A.: Efficient Visual Search of Videos Cast as Text Retrieval, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 31, No. 4, pp. 591 606 (2009). [7] Farquhar, J., Szedmak, S., Meng, H. and Shawe-Taylor, J.: Improving bag-of-keypoints image categorisation: Generative models and pdf-kernels (2005). [8] Winn, J., Criminisi, A. and Minka, T.: Object categorization by learned universal visual dictionary, Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, Vol. 2, IEEE, pp. 1800 1807 (2005). [9] Jégou, H., Perronnin, F., Douze, M., Sánchez, J., Pérez, P. and Schmid, C.: Aggregating local image descriptors into compact codes, Pattern Analysis and Machine Intelligence, IEEE Transactions on, Vol. 34, No. 9, pp. 1704 1716 (2012). [10] Zhou, X., Yu, K., Zhang, T. and Huang, T. S.: Image classification using super-vector coding of local image descriptors, Computer Vision ECCV 2010, Springer, pp. 141 154 (2010). [11] Wang, J., Yang, J., Yu, K., Lv, F., Huang, T. and Gong, Y.: Locality-constrained linear coding for image classification, Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, IEEE, pp. 3360 3367 (2010). [12] Boureau, Y.-L., Bach, F., LeCun, Y. and Ponce, J.: Learning mid-level features for recognition, Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, IEEE, pp. 2559 2566 (2010). [13] Yang, J., Yu, K., Gong, Y. and Huang, T.: Linear spatial pyramid matching using sparse coding for image classification, Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, IEEE, pp. 1794 1801 (2009). [14] Chatfield, K., Lempitsky, V., Vedaldi, A. and Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods (2011). [15] Boiman, O., Shechtman, E. and Irani, M.: In defense of nearest-neighbor based image classification, Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, IEEE, pp. 1 8 (2008). [16] Cinbis, R. G., Verbeek, J. and Schmid, C.: Image categorization using Fisher kernels of non-iid image models, Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE, pp. 2184 2191 (2012). [17] Tanaka, M., Torii, A. and Okutomi, M.: Fisher Vector based on Full-covariance Gaussian Mixture Model, IPSJ Transactions on Computer Vision and Applications CVA Vol. 5, pp. 50 54 (2013). [18]. PRMU Vol. 113, No. 403, pp. 201 206 (2014). [19] Lowe, D. G.: Distinctive image features from scaleinvariant keypoints, International journal of computer vision, Vol. 60, No. 2, pp. 91 110 (2004). [20] Jaakkola, T., Haussler, D. et al.: Exploiting generative models in discriminative classifiers, Advances in neural information processing systems, pp. 487 493 (1999). [21] Ji, Z.: Decoupling Sparse Coding with Fusion of Fisher Vectors and Scalable SVMs for Large-Scale Visual Recognition, Computer Vision and Pattern Recognition Workshops (CVPRW), 2013 IEEE Conference on, IEEE, pp. 450 457 (2013). [22] Cinbis, R. G., Verbeek, J. and Schmid, C.: Segmentation driven object detection with Fisher vectors, Computer Vic 2015 Information Processing Society of Japan 6

sion (ICCV), 2013 IEEE International Conference on, pp. 2968 2975 (2013). [23] Sydorov, V., Sakurada, M. and Lampert, C. H.: Deep Fisher Kernels End to End Learning of the Fisher Kernel GMM Parameters. [24] Perronnin, F. and Dance, C.: Fisher kernels on visual vocabularies for image categorization, Computer Vision and Pattern Recognition, 2007. CVPR 07. IEEE Conference on, IEEE, pp. 1 8 (2007). [25] Bishop, C. M.: Pattern Recognition and Machine Learning, Springer Science+Business Media, LLC, New York, USA (2006). [26] Weinberger, K. Q. and Saul, L. K.: Distance Metric Learning for Large Margin Nearest Neighbor Classification, J. Mach. Learn. Res., Vol. 10, pp. 207 244 (online), available from http://dl.acm.org/citation.cfm?id=1577069.1577078 (2009). [27] Perronnin, F., Sanchez, J. and Mensink, T.: Improving the fisher kernel for large-scale image classification, Computer Vision ECCV 2010, Springer, pp. 143 156 (2010). [28] Sharan, L., Rosenholtz, R. and Adelson, E.: Material perception: What can you see in a brief glance?, Journal of Vision, Vol. 9, No. 8, pp. 784 784 (2009). [29] Fei-Fei, L., Fergus, R. and Perona, P.: Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories, Computer Vision and Image Understanding, Vol. 106, No. 1, pp. 59 70 (2007). c 2015 Information Processing Society of Japan 7