FUZZY ENCODING FOR IMAGE CLASSIFICATION USING GUSTAFSON-KESSEL ALGORITHM Ashish Gupta and Richard Bowden Centre for Vision, Speech and Signal Processing University of Surrey Guildford, United Kingdom ABSTRACT This paper presents a novel adaptation of fuzzy clustering and feature encoding for image classification. Visual word ambi- guity has recently been successfully modeled by kernel code- books to provide improvement in classification performance over the standard ‘Bag-of-Features’(BoF) approach, which uses hard partitioning and crisp logic for assignment of fea- tures to visual words. Motivated by this progress we utilize fuzzy logic to model the ambiguity and combine it with clus- tering to discover fuzzy visual words. The feature descriptors of an image are encoded using the learned fuzzy member- ship function associated with each word. The codebook built using this fuzzy encoding technique is demonstrated to pro- vide superior performance over BoF. We use the Gustafson- Kessel algorithm which is an improvement over Fuzzy C- Fig. 1. Fuzzy Encoding Schematic: feature vectors from Means clustering and can adapt to local distributions. We images are clustered to find cluster centroid and fuzzy mem- evaluate our approach on several popular datasets and demon- bership function. An image is encoded based on the cumula- strate that it consistently provides superior performance to the tive membership score of its descriptors. The encoded feature BoF approach. vector of the image with label is used to train a classifier. Index Terms— Fuzzy Clustering, Image Classification, Gustafson-Kessel algorithm a hard partitioning scheme utilized by BoF is not the best method to model this data. In BoF the assignment of fea- 1. INTRODUCTION ture descriptors to visual words follows a ‘winner-takes-all’ scheme which can also be considered as crisp-logic. Gemert The traditional approach to image classification is, use of et al. assign feature descriptors to visual words ‘smoothly’ ‘Bag-of-Features’ (BoF) method to model occurrence statis- to model the visual continuum using a kernel associated with tics of visual words amongst the feature descriptors in an each visual word. This work motivates our research on uti- image. The visual words are representative feature descrip- lizing fuzzy logic based methods to model this proposed tors and commonly equal to the centroid of clusters acquired ‘ambiguity’ about visual words. In our approach, see figure by a vector quantization technique like k-means clustering. 1, the feature descriptors are assigned to a visual word based Subsequent developments pursued a search for an optimal on a fuzzy membership function associated with each visual set of visual words to model a visual category in images. At word. We compute visual words using a fuzzy clustering al- present, better results are obtained by increasing the number gorithm, which were developed as a synthesis of fuzzy logic of visual words at the expense of high sparsity and com- and clustering algorithms. The feature descriptors from an putational cost. An alternative approach to building huge image are encoded using the fuzzy membership of each fuzzy codebooks is the notion of ‘visual word ambiguity’ which has visual word. been recently introduced by Gemert et al. in [1]. The authors In this paper, we present a novel adaptation of the suggest that visual feature space is continuous and conse- Gustafson-Kessel (GK) algorithm [2] for building a fuzzy quently a continuum exists between visual words. Therefore, visual codebook. GK algorithm belongs to the family of fuzzy clustering algorithms and can be considered an exten- 978-1-4673-2533-2/12/$26.00 ©2012 IEEE 3137 ICIP 2012 (a) Kmeans (b) Fuzzy C-Means (c) Gustafson-Kessel Table 1. Fuzzy Encoding sion of the Fuzzy C-Means (FCM) algorithm [3]. We analyze attempts to minimize the objective function: the classification performance or our approach against the the K N standard BoF model on several popular datasets. In addition, 2 J(X; U, V) = µm ik xk − vi , 1 ≤ m < ∞ (2) we analyze the variation in performance for different sizes of i=1 k=1 visual codebook. Our principal contribution is adapting the nascent work on where U = [µik ], and µik is degree of membership of visual word ambiguity, which used intuitively selected ker- xk to cluster i. m is a measure of fuzzification, where for nels, to the well developed field of fuzzy logic. We show m = 1, FCM is equivalent to kmeans. Using FCM, ν reflects that FCM based feature encoding provides improved perfor- a degree of membership of feature vectors to various clusters mance. We build upon this by using GK algorithm which can and consequently is able to model the ambiguity associated adapt to local topology of the feature space. with visual words, as compared to Kmeans. However, FCM results in hyper-spherical clusters of equal volume, see table 2. SYSTEM 1(b). The typical dataset D utilized for image classification con- Algorithm 1 Gustafson-Kessel Fuzzy Encoding sists of multiple visual categories C, each containing several τ ←1 positive labeled (γ : I → 1) training images I . The nega- repeat N (τ −1) m tive labeled (γ : I → −1) training images I ⊖ are culled from vi (τ ) ← k=1 (μik N ) xk (τ −1) m (μ ) remaining categories in the dataset. Local affine invariant fea- N k=1 ik (τ −1) m (τ ) (τ ) T (μ ) (x k −vi )(xk −vi ) ture descriptor vectors x(ignoring image label), collated from Fi ← k=1 ik N τ −1 m ;1≤i≤K k=1 (μik ) a balanced sample of I and I ⊖ , are clustered using each of 2 (τ ) T 1 −1 (τ ) DikA = (xk − vi ) [ρi det(Fi ) Fi ](xk − vi ) n Kmeans, FCM, and GK clustering. For each image Ii ∈ D in i φk ← {i | Dik = 0} the dataset we encode the feature vectors of that image. The for k ← 1, N do encoded feature vector νi and associated label γi are utilized if φk = ∅ then to train a binary SVM classifier using an RBF kernel. The (τ ) K DikAi 2 −1 difference amongst the methods lies in the computed cluster µik ← ( j=1 ( DjkA ) m−1 ) j centroids and the membership of a feature vector to each clus- else ter. (τ ) 0 if DikAi > 0 µik ← 1 |φk | if DikAi = 0 end if 3. FUZZY ENCODING end for The Kmeans algorithm computes cluster centroids V = τ ←τ +1 {v1 , v2 , . . . , vK } and assignment of feature vectors X = until U (τ ) − U (τ −1) < {x1 , x2 , . . . , xN } to these clusters by minimizing an objec- for j ← 1, M do tive function: ν j ← µ 1j k 1 if Ij ∈ C K N 1 if xk ∈ vi γj ← J(X; V) = i 2 i 1k xk −vi , 1k = −1 if Ij ∈ /C 0 otherwise end for i=1 k=1 (1) Each feature vector xk is assigned to exactly one cluster only, This simplistic model is unable to represent the compli- see table 1(a). The encoded feature for an image ν is a dis- cated topology of visual data, where each cluster roughly cor- crete valued histogram. In comparison, the FCM algorithm responds to visually distinct object parts that have a unique 3138 distribution in feature space. Therefore, we employ the GK algorithm with extends FCM, by replacing the Euclidean dis- tance by another metric induced by a positive definite ma- trix A so that hyper-ellipsoidal clusters can be built instead of simple hyper-spherical clusters only, see table 1(c) [4]. The clusters have an associated norm inducing matrix A = {A1 , A2 , . . . , AK }, which provides the inner product norm: 2 DikA i = (xk − vi )T Ai (xk − vi ) , 1 ≤ i ≤ K , 1 ≤ k ≤ N (3) The objective function minimized in GK is: K N 2 J(X; U, V, A) = µm ik DikAi (4) i=1 k=1 Due to the linear dependency between J and A, we con- Fig. 2. Classification performance of GK,FCM-encoding strain A to obtain a feasible solution to J. The determinant and BoF for datasets {Scene15, Caltech101, Caltech256, ρi = det(Ai ) is kept constant, which corresponds to optimiz- VOC2006, VOC2010 }. ing cluster shape whilst keeping its volume constant. Ai can 1 now be expressed as: Ai = [ρi det(Fi )] n F−1 i , where Fi is the fuzzy covariance matrix of the ith cluster. in figure 2 show the mean accuracy of a cross-validated ex- Our approach to Gustafson-Kessel fuzzy encoding is pro- periment averaged across all categories of the dataset. The vided in algorithm 1. The encoded features for each image performance varies across datasets due to the inherent diffi- and associated label {ν, γ} is used to train a classifier. culty of each dataset. The fuzzy encoding methods, FCM and GK, consistently outperform BoF on all datasets. This is 4. EXPERIMENTS sound empirical support for the efficacy of the fuzzy encod- ing approach. Note that GK performs marginally better than To demonstrate the efficacy of GK encoding technique we FCM on almost all the datasets. With optimization of the co- comparatively evaluate the classification performance of GK variance matrix for each cluster in GK, we expect the margin encoding against FCM encoding and the BoF technique. of improvement provided by GK to improve further. Since, the focus of the experiments is comparison of clus- tering and encoding techniques, the choice of the feature descriptor and the classifier utilized in the experiments was 4.2. Performance across categories based on popularity rather then performance. The classifier utilized is SVM with an RBF kernel. We used the SIFT The graphs in table 2 show the comparative performance descriptor, which is the most popular member of the family of BoF and GK approaches for each visual category in the of local affine invariant descriptors. While some other de- datasets: VOC2006, VOC2010, and Scene15. The absolute scriptors like LBP, SURF might provide a marginally better and relative performance of both BoF and GK varies across classification performance, the relative performance across the categories due to the variation in content and complexity the techniques being evaluated will remain unchanged. We of each category. GK is observed to consistently out perform utilize several popular datasets in the vision community for a BoF in almost all categories. comprehensive evaluation of our approach and compare it to the performance of the standard BoF technique. The datasets utilized are Caltech-101 [5]; Caltech-256 [6]; Pascal VOC 4.3. Visual dictionary size 2006; Pascal VOC 2010 [7]; and Scene-15. These datasets vary in terms of number of categories; number of images In this experiment, we analyze the relation between the vi- within each category; visual domain of categories; inherent sual dictionary size or the number of fuzzy visual words and difficulty and together render a comprehensive evaluation of classification performance of the encoding techniques consid- the techniques. ered. The set of codebook sizes considered is {16, 32, 64, 128, 256, 512 }, and the dataset utilized is Caltech101. The 4.1. Performance across datasets graph present the average accuracy across all categories in the dataset for each codebook size. GK performs better than We compare the BoF, FCM, and GK approaches in terms of BoF for all dictionary sizes. In addition, there is a gradual their classification performance across 5 datasets. The results improvement in performance as dictionary size increases. 3139 Table 2. The classification performance of BoF, FCM, and GK approaches across the visual categories in a dataset for the VOC2006, VOC2010, and Scene15 datasets. The graphs shows the mean accuracy using a cross validation scheme. The categories are shown on the x-axis. The yellow bar corresponds to the BoF approach while the black bars correspond to the GK approach. 6. REFERENCES [1] J.C. van Gemert, C.J. Veenman, A.W.M. Smeulders, and J.-M. Geusebroek, “Visual word ambiguity,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 32, no. 7, pp. 1271 –1283, july 2010. [2] Donald E. Gustafson and William C. Kessel, “Fuzzy clus- tering with a fuzzy covariance matrix,” in Decision and Control including the 17th Symposium on Adaptive Pro- cesses, 1978 IEEE Conference on, jan. 1978, vol. 17, pp. 761 –766. [3] A. Baraldi and P. Blonda, “A survey of fuzzy clustering algorithms for pattern recognition. i,” Systems, Man, and Fig. 3. Analysis of variation in classification performance Cybernetics, Part B: Cybernetics, IEEE Transactions on, with codebook size for GK,FCM-encoding and BoF tech- vol. 29, no. 6, pp. 778 –785, dec 1999. niques. The dataset utilized is Caltech101. The codebook sizes considered are {16, 32, 64, 128, 256, 512 }. [4] R. Babuka, P.J. van der Veen, and U. Kaymak, “Improved covariance estimation for gustafson-kessel clustering,” in Fuzzy Systems, 2002. FUZZ-IEEE’02. Proceedings of the 2002 IEEE International Conference on, 2002, vol. 2, pp. 5. CONCLUSION 1081 –1085. [5] L. Fei-Fei, R. Fergus, and Pietro Perona, “Learning gen- We have introduced fuzzy encoding technique for visual cate- erative visual models from few training examples: An in- gorization using Fuzzy C-Means to compute a fuzzy member- cremental bayesian approach tested on 101 object cate- ship function. We extended this work to the Gustafson-Kessel gories,” 2004. fuzzy clustering algorithm, which was shown to adapt to lo- cal distributions. We demonstrated empirically that our fuzzy [6] G. Griffin, A. Holub, and P. Perona, “Caltech-256 object encoding approach is consistently better than the BoF model, category dataset,” Tech. Rep. 7694, California Institute using several popular datasets. GK algorithm was shown to of Technology, 2007. provide a marginal improvement over FCM, which we expect to improve with optimization of covariance matrices in future. [7] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The PASCAL Visual Object Classes Challenge 2010 (VOC2010) Results,” http://www.pascal- network.org/challenges/VOC/voc2010/workshop/index.html, 2010. Acknowledgement This work is supported by the EPSRC project Making Sense (EP/H023135/1). 3140
US