Appl. Math. Inf. Sci. 8, No. 4, 1755-1766 (2014) 1755 Applied Mathematics & Information Sciences An International Journal http://dx.doi.org/10.12785/amis/080433 Moving Object Detection using Lab2000HL Color Space with Spatial and Temporal Smoothing Muhammet Balcilar1,∗ , M. Fatih Amasyali1 and A. Coskun Sonmez 2 1 Department of Computer Engineering, Yildiz Technical University, EEF Block, D124, Davutpasa, 34220, Istanbul, Turkey 2 Faculty of Computer and Informatics, Istanbul Technical University, Maslak, 34469, Istanbul, Turkey Received: 3 Aug. 2013, Revised: 5 Nov. 2013, Accepted: 6 Nov. 2013 Published online: 1 Jul. 2014 Abstract: In order to detect moving objects such as vehicles in motorways, background subtraction techniques are commonly used. This is completely solved problem for static backgrounds. However, real-world problems contain many non-static components such as waving sea, camera oscillations, and sudden changes in daylight. Gaussian Mixture Model (GMM) is statistical based background subtraction method, in which values of each pixels features are represented with a few normal distributions, partially overcame such problems at least. To improve performance of GMM model, using spatial and temporal features in Lab2000HL color space which have linear hue band, is proposed in this study. The spatial and temporal features performed by using spatial low-pass filter and temporal kalman filter respectively. As a performance metric, the area under the Precision Recall (PR) curve is used. In addition to videos existing in the I2R dataset, a new dataset which images gained from traffic surveillance cameras placed over the entrance of the Istanbul FSM Bridge at different times of the day used for compare proposed method against other well-known GMM version. According to our tests proposed method has been more successful to the other methods in most cases. Keywords: Moving object detection, Background subtraction, Lab2000HL, GMM, Kalman smoothing 1 Introduction has been light changes. Initially, methods independent from light changes were introduced. Then methods Identifying moving objects from video images is a basic finding non stationary objects in the background such as and critical issue in computer vision applications. A leaves, sea waves, rain and snow, and finally methods well-known approach to perform this is the background distinguishing temporary movements of objects such as subtraction technique, which determines significantly stopping and moving again were developed [1]. In terms different regions of the background model as foreground of models used, background subtraction methods and deletes the remaining parts. There exist two basic introduced up to present can be classified as, common steps of background subtraction techniques Deterministic, Statistical, Filter-Based Estimation and except preprocessing and post processing, which are, Fuzzy methods [2]. Up to image processing methods, forming the up-to-date background model that represents these methods can be grouped into two as pixel based and the background image and determining the foreground region based. In addition to these, there are methods regions which significantly differ from the background which use one or more features such as shape, texture, model. Background subtraction methods can be classified color, gradient, histogram and motion. up to background initialization, methods used for The method in which for each pixel, mean and median background modeling, features used in modeling and values of past n pixels data are assigned as background is foreground detection techniques. Although in some introduced in [3]. This method requires extremely large applications some extra post processing procedures are memory with respect to the variable n. On the other hand, applied in order to reduce foreground detection errors, running average method does not save previous values these methods are considered indifferent. and a new value affects the background as much as the Many successful background subtraction algorithms learning coefficient [4]. Therefore, memory requirement are maintained in the literature. The first problem faced is reduced. Using Gaussian distributions is the most ∗ Corresponding author e-mail:
[email protected]c 2014 NSP Natural Sciences Publishing Cor. 1756 M. Balcilar et. al. : Moving Object Detection using Lab2000HL Color... popular approach for statistical modeling of each pixel. normalized RGB (nRGB) color values of pixels are used The simplest method is to detect an average of past values in GMM is cited in [22]. The study in which HSV color of the pixel, subtract this from the pixel value and obtain a space is used is in [23]. In [24], HSI color space is used, binary form from the result up to a determinate threshold. but HS and I bands are modeled with different GMMs. In The adaptive application of this method consists of the study [25] background modeling is done with Luv repetitive updates of model parameter. A single Gaussian color space values of the background. In [26], improved distribution model in which background is considered HSL (IHSL) color space, in which chromatic plane is stationary is also introduced in [5]. However, this method determined with circular statistics, is preferred in is not applicable for modeling backgrounds consisting of background modeling. In [27], the successes of other periodic movements. Expression of the background in color spaces in background modeling are compared terms of more than a single Gaussian distribution is called according to RGB and it is reported that the most Gaussian Mixture Model [6], and required parameter succeeded one is YCbCr color space. How much the values of the model is computed with the online k-means gradient of the corresponding pixel of incoming frame fits method. Many researchers added different extensions to the gradient of the background is computed and thus the GMM method afterwards [7]. The weakest parts of background/foreground is decided. In [28], the features these methods were the assumption that the distribution selected to be used in GMM were the main location and of past values of the pixel would fit the Gaussian the main angle values of the gradients of pixels in the nxn distribution, and the need for estimation of these neighborhood of the pixel. Thus, in this application both distribution parameters. A solution to this problem was location and gradient information are used in background proposed in [8] which is background modeling with the modeling. On the other hand, in [29], background KDE (Kernel Density Estimator) technique. After that, modeling was made using RGB values, gradient there has been many background subtraction methods magnitude, direction values and 8 Haar features. proposed which use KDE such as [9]. However, this The purpose of this study is to improve performance method has also a very huge memory requirement. of GMM background subtraction method, by the way of Another method in background modeling is clustering using Lab2000HL color space with spatial and temporal method. The background subtraction models that use features. Lab2000HL color space, which is an improved k-means clustering and sequential clustering are given in version of CIELAB color space, is recently introduced [10] and [11] respectively. But, the most remarkable one and is thought to perform a better modeling of human of clustering based models is the codebook method [12]. perception. Having linear Hue channel is the most This method uses vector quantization technique which is advantages of this color space. To perform spatial and an unsupervised learning algorithm. Subspace learning temporal feature, adding spatial neighbor and temporal based background subtraction methods, which is studied neighbor pixels values to feature vector have heavy especially in recent years, has been developed with many computational cost. Beside this situation, spatial filter is versions. Recently, incremental robust PCA (Principal used for spatial consistency; kalman filter is used for Component Analysis) which computes the transformation temporal consistency. matrices considering outliers is also stated in [13]. The The remaining parts of the paper are organized as background subtraction methods which use Independent following. The second section includes background Component Analysis (ICA) are given in [14]. Incremental subtraction methods, and the third section explains Maximum Margin Criterion (IMMC) is on the other hand features in detail. Performance measure is described in presented in [15]. Background subtraction, in its nature, section 4. Section 5 is reserved for application and test involves many uncertainties. To get rid of them, the results at each datasets. Finally, conclusion is presented in method which reconstruct the classical GMM method section 6. with type-2 fuzzy membership functions is stated in [16], as applications in which background modeling is done by many fuzzy similarities are in [17]. Background modeling 2 Background Subtraction with fuzzy logic rules on the other hand is presented in [18]. Background modeling methods, trying to model the Reviewing the literature according to the features background with statistical distributions of values of a used in GMM method, it is observed that researchers have fixed feature of the pixel, have a significant place in the used different color space and different features together literature. These methods, unlike background estimation or separately. The study in [19], has used the YUV color methods, are concerned with the distribution of the values space values. Other study, RGB color components in of a fixed feature of the pixel rather than ordering them. 8x8x3 blocks obtained from three frames in time of the The method we used in this study differs from the 8x8 neighborhood of the processed pixel are chosen as background subtraction method proposed in [6], in terms features [20]. The method introduced in [21] is called of features, but they are similar in basic terms. This Local Patch GMM. In this study, the RGB color space method is based on GMM, which is indeed an values of all pixels in a fixed neighborhood of the pixel unsupervised learning method, for clustering data by are used in GMM for modeling. The study in which the minimizing the variance. The most significant difference c 2014 NSP Natural Sciences Publishing Cor. Appl. Math. Inf. Sci. 8, No. 4, 1755-1766 (2014) / www.naturalspublishing.com/Journals.asp 1757 from the classical GMM is that past values of data are not updated as (4). evaluated with expectation maximization technique; they are evaluated online, step by step, as they are produced wsi,t+1 = (1 − α )wsi,t + α and the centers of the clusters, variances and weights are µsi,t+1 = (1 − ρ )µsi,t + ρ Xst+1 (4) updated stepwise. In this method, each pixel will be 2 2 σsi,t+1 = (1 − ρ ) σsi,t + ρ Xst+1 − µsi,t+1 Xst+1 − µsi,t+1 T modeled with K Gaussian distributions. Each distribution has a weight, mean vector and a covariance matrix. These The temporary variable ρ , represents in (5). values of distributions will be updated as soon as a new value of the pixel has arrived. The position of the pixel in ρ = α N Xst+1 , µsi,t , Σsi,t (5) the image is expressed with s = (x, y), i represents the number of the distribution, and t denotes the time. Let wsi,t The function N in the above expression is the probability denote the weight of the ith distribution of the pixel at density function of the multi-dimensional Gaussian position s and time t, µsi,t denote the mean vector, and Σsi,t distribution. The inputs for the function N are, denote the covariance matrix. In these expressions, respectively, the vector for which the probability will be weights are in scalars and the mean vector is a row vector computed, the mean of distribution center and the with the same size as the pixels feature. The covariance covariance matrix of the distribution. Since the ith matrix, up to the hypothesis that none of the features has a distribution is chosen, the mean vector and the variances correlation with each other, is a diagonal matrix, having of all other distributions will be fixed. But the weights all entries except the diagonal zero, equivalent number of need to be reduced with respect to (6). rows and columns to the mean vector, and is defined in (1). wsi,t+1 = (1 − α )wsi,t+1 j = i, j = 1..K (6) 2 Σsi,t = σsi,t I. (1) If none of the distributions has the distance to the corresponding vector of the arriving frame, in this case I denote the identity matrix, and σ denote the standard the class with the smallest weight must be deleted and deviation of the pixel feature. With respect to the above replaced with a new class with mean as the feature vector equation, dimensions of the pixel must all have equal of the arriving pixel, having the weight of the lowest variances. When a new frame arrives, the background deleted weight, and the largest variance value of the models of the processed pixel are arranged in a remaining distributions as (7). descending order according to the wsi,t /σsi,t rates. Then the minimum of the distributions with total weights over the wsi,t+1 = LowWeight threshold are decided to be the background and the µ i,t+1 = Xst+1 (7) s2 remaining distributions are considered as foreground σsi,t+1 = LargeVariance distributions. The mathematical expression of this procedure is defined in (2). If for each arriving frame, every pixel is updated with ! these update procedures, then an up to time model of the b B = arg min b ∑ wi,ts > T (2) background can be obtained. i=1 In the arrival frame, the Mahalanobis differences to all 3 Features distributions of the pixels feature vector are computed. If the pixel is far from all distributions, more than k times of Lab2000HL color space is better than other color spaces the standard deviation of the distribution, then it means for background modeling shown in the previous study that the pixel could not be classified. If otherwise, then it [30, 31]. This section explains, Lab2000HL color space means that it is nearer to the first class satisfying the and how spatial and temporal consistency taken into difference boundary and the parameters of this account, instead of using the color value directly. distribution will be updated. At time t + 1, if we denote by Xsi,t the feature vector of the pixel in the arriving frame, the question whether its difference to the ith distribution is 3.1 Lab2000HL Color Space in acceptable limits or not, is expressed in the (3). Because the information gained directly from the camera r T −1 sensor is in RGB color space, these color bands are Xst+1 − µsi,t Σsi,t Xst+1 − µsi,t < kσsi,t (3) commonly used as a feature. However, these bands are widely affected by the light since because they cannot represent the chromaticy feature of the segment If the vector Xst+1 is included in the ith class, as α to be independently. There is a lot of research in the literature the learning coefficient, parameters for the ith class are on distinctive properties of color spaces. According to c 2014 NSP Natural Sciences Publishing Cor. 1758 M. Balcilar et. al. : Moving Object Detection using Lab2000HL Color... ∗ Fig. 1: Transformation map from CIELAB to Lab2000HL, a) Transformation of channel L00,HL , b) Transformation of channel a∗00,HL , ∗ c) Transformation of channel b00,HL this research, YCbCr and HSV color spaces were more disagreement between CIEDE2000 color differences in accurate since they represent the amount of light and the CIELAB and Euclidean distances in the new space. The chromaticy separately. CIELAB color space, on the other lookup table maps ϕ to ϕ opt = {ϕ0opt , ϕ1opt , ..., ϕ359 opt } . hand, is a color space which indicates proper changes in ϕ opt must be monotonically increasing array. In [35] the direction of human color perception. Its components optimization problem described above is solved. By using are the lightness of the color (L∗ ) and two color-opponent lookup table which is results of optimization problem, the dimensions (a∗ ,b∗ ). Lab2000HL (L00 ∗ , a∗ ∗ 00,HL ,b00,HL ) color transformation a∗00 → a∗00,HL and b∗00 → b∗00,HL can be space, which is an improved version of CIELAB color possible. The transformations can be turned into a single space, is recently introduced and is thought to perform a transformation by composition: Φ = Φ00 ◦ Φ00,HL∗ . The better modeling of human perception [32]. single transformation maps, which presented in [33] is In [32], a numerical method to determine a shown as Fig 1. Detailed information about this color transformation of a color space into a hue linear color space and calculation procedure can be found in [32, 33]. space with a maximum degree of perceptual uniformity RGB to Lab2000HL conversion code is available at [31]. proposed. According to [32], the transformation takes place in two steps. To obtain Lab2000HL, CIELAB is determined as initial color space. Because it is well 3.2 Spatial Smoothing known and used in many industrial standards. The transformations can be summarized as (8) [32]. The images involve spatial consistency at high rates. When each pixel is processed separately consistency will Ψ00 : CIELAB → Lab2000 (8) be disregarded and the performance of the system Ψ00,HL : Lab2000 → Lab2000HL reduces. In order to take consistency into account, not In the first step, the transformation provides only the value of the processing pixel in the background perceptual uniformity. The transformation of the CIELAB subtraction model but also the values of the neighboring (L∗ a∗ b∗ ) color space into a Euclidean space with respect pixels may be given. However in this case the length of to the Lab2000 (L00 ∗ ,a∗ ,b∗ ) color-difference formula has the feature vector increases at a rate of square the 00 00 been described in [33]. According to this paper, Ψ00 neighborhood length and hence it causes rise in transformation performed by one-dimensional look-up computational cost. Alternatively, in this study, the table for L∗ → L00 ∗ transformation, two-dimensional approach of considering the pixel with pixels in its fixed lookup table for each a∗ → a∗00 and b∗ → b∗00 length neighborhood and producing values for each color transformation. Required look-up tables are also channel separately is preferred. When determining the presented in that research. representing value of the pixel group, features of pixel The second step of transformation maps the curves of values such as entropy, edge, texture, mean, etc. are used constant hue to straight lines while preserving the in literature. In this study, low-pass filter response for the perceptual uniformity as far as possible. Because L00 ∗ pixel group is used as feature. represents lightness coordinate and lightness do not effect The filters preferred in this study are the filters which to hue information, the lightness values are not changed. approximately calculate the human perceiving color Ψ00,HL transformation performed only on a∗00 and b∗00 difference between two images and are used in channels. A mapping of the resulting constant hue curves S-CIELAB metric. In this method, the received image is to straight lines with hue angles ϕ = {ϕ0 , ϕ1 , ..., ϕ359 } can first transformed into opponent color space (AC1 C2 ) in be performed by a two-dimensional color lookup table. order to make it independent of the input device. AC1 C2 This lookup table must be minimizing the mean color space has one luminance channel and two c 2014 NSP Natural Sciences Publishing Cor. Appl. Math. Inf. Sci. 8, No. 4, 1755-1766 (2014) / www.naturalspublishing.com/Journals.asp 1759 Fig. 2: Diagram of obtaining the background subtraction inputs. chrominance channels. These channels were determined background model then temporal consistency happens to through a series of psychophysical experiments testing for be disregarded and accordingly much error occurs. In pattern color separability [34]. Without being completely Literature, in order to take temporal features in to orthogonal, all channels of this color space are obtained account, pixel’s past values of a fixed time interval are with linear transformations from CIEXYZ color space. added to the feature vector. This process increases the After the color space transformation is made, spatial computational cost since it will increase the dimension of filters special to each channel are applied using lters that the feature vector for a discrete time interval. approximate the contrast sensitivity functions (CSF) of Alternatively, rather than using directly the value of the the human visual system [34]. 3 filters for Luminance pixel, it is a better solution to use its values which are channel and 2 filters for each chrominance channel are filter processed in the time axis. Moving average filter, applied. After filter responses are gathered, the process since it evaluates all values in its filter mask by lifts to the desired color space by applying reverse color multiplying with same coefficient, lacks off adapting to space transformation again. Each Gaussian filter has sudden signal changes at the same time with achieving different spread and different weight from others; these desired rate of noise removal. To handle this problem, values are given in Table.1 [34]. The place of spatial first-order low-pass filter for time series can be used. Up consistency in the procedure of obtaining input for to this method, the smallest coefficient is applied to the background subtraction is shown in Fig.2 oldest pixel value in the filter mask and the largest coefficient is applied to the newest pixel value. However in this method, filter coefficients are fixed, they do not change with respect to time. Moreover it is not concerned Table 1: Weight and spread of filters about how the signal changes or what its dynamics are. It Filter Weight Spread makes filtering for each signal with constant coefficients Channel A 1.00327 0.0500 during the signal. Therefore it is almost well adapted to Channel A 0.11442 0.2250 sudden changes. Kalman filtering, for time series filtering Channel A -0.11769 7.0000 process, solving problems stated above, is a widely used Channel C1 0.61673 0.0685 filtering method [35]. It is used in engineering for time Channel C1 0.38328 0.8260 series filtering, for prediction of the next value and for Channel C2 0.56789 0.0920 fusion of information gathered from several sources [35]. Channel C2 0.43212 0.6451 The most significant advantage of using Kalman filter is that it processes the signal with considering its dynamics. Therefore, it is possible to gain more proper filtering and prediction. With Kalman filter, the process of determining the position variable at time t using measurements up to 3.3 Temporal Smoothing time t is called estimation, determining the position variable at time t + 1 is called prediction, and determining The values of a pixel in the time axis are consistent with the position variable at time t − s is called smoothing. each other. If pixel’s temporal feature is not given to the While estimation and prediction can be done real-time c 2014 NSP Natural Sciences Publishing Cor. 1760 M. Balcilar et. al. : Moving Object Detection using Lab2000HL Color... Table 2: Temporal Smoothing Algorithm Algorithm: Kalman Smoothing with Time Delay Inputs: xˆ0|0 , P0|0 , Q, R, A, H, s for t = 0 to s − 1 do xˆt+1|t = Axˆt|t Pt+1|t = APt|t AT + Q −1 Kt+1 = Pt+1|t CT CPt+1|t CT + R xˆt+1|t+1 = xˆt+1|t + Kt+1 yt+1 −Cxˆt+1|t Pt+1|t+1 = Pt+1|t − Kt+1CPt+1|t endfor for index = 0 to end do T = index + s Fig. 3: Effect of Temporal Smoothing Process. for t = T − 1 to index step -1 do −1 Lt = Pt|t AT Pt+1|t xˆt|T = xˆt|t + Lt xˆt+1|T − xˆt+1|t Pt|T = Pt|t + Lt Pt+1|T − Pt+1|t LtT online, smoothing, since it requires the feature values of endfor measurements, can be done only with time delay during yˆt = Cxˆt|T the background subtraction application. An example for a t =T xˆt+1|t = Axˆt|t signal subjected to Kalman filtering and its smooth version is given in Fig 3. Application of Kalman Filter as Pt+1|t = APt|t AT + Q −1 a smoothing process involves two steps up to the method Kt+1 = Pt+1|t CT CPt+1|t CT + R which is also known as Rauch-Tung-Striebel Smoother xˆt+1|t+1 = xˆt+1|t + Kt+1 yt+1 −Cxˆt+1|t [36]. Forward pass step is a Kalman filter which makes estimation filter. Smoothing process on the other hand is Pt+1|t+1 = Pt+1|t − Kt+1CPt+1|t made in backward pass step. In the original case, the endfor method requires all data which is going to be smoothed. However this case is not suitable for background subtraction. Instead of all data, processing exactly s many feature data, although it is not completely real time, the 4 Performance Measure method can be applied with time delay. The recommended time delayed Kalman smoothing algorithm In this study, the performance measurements are is as in Table.2. performed pixel-based. After the background/foreground segmentations of all pixels of all frames in the dataset are The subscript t in the algorithm represents time made, they are compared with ground truth segmentation variable, xˆ represents position vector, A denotes the state results. Receiver Operator Characteristic (ROC) curves transition matrix which represents the signal dynamics, P are commonly used to present results for binary decision represents the error covariance matrix, Q represents the problems in segmentation. However, when dealing with covariance matrix for model error K represents the highly skewed datasets, Precision-Recall (PR) curves give Kalman gain value, C stands for state-to-measure matrix a more informative picture of an algorithm’s performance which translates state vector to measurements, R [37]. For that reason, area under the PR curve is represents the covariance matrix of measure error, y determined for performance metrics. denotes the value of the measurement, and L denotes the Suppose we have a large collection of pixels, C, of gain coefficient of smoothing. which only a fraction π (π /C << 1) is foreground. An algorithm detects foreground pixel respect to t [0 1] from C, of which h(t) [0 t] is later confirmed to be foreground. The initial values for xˆ0|0 , P0|0 , Q, R in the method The function h(t) is called a hit curve. Two numeric should be given. Also, matrices for constants A and H, performance measures often considered are the recall, related to the system model and the time delay s should defined as the probability of detecting an item given that be defined initially. it is foreground, and the precision, defined as the probability that an item is foreground given that it is The smooth measurement values the method has detected by the algorithm. At the particular detection produced are denoted by y. ˆ Since each channel of the level of t, the recall and precision are simply in (9). color space changes in different rates with respect to time, for each channel of each pixel a different Kalman filter is h(t) h(t) applied. r(t) = p(t) = (9) π t c 2014 NSP Natural Sciences Publishing Cor. Appl. Math. Inf. Sci. 8, No. 4, 1755-1766 (2014) / www.naturalspublishing.com/Journals.asp 1761 Fig. 4: Foreground detection results of a sample test image obtained under different variances. Changes in Precison and Recall rates with respect to variance and the PR graph. 1 0.1 10 A= C= 1 0 Q= The PR curve realized by the {r(t), p(t)} points in the 0 1 0 3 two dimensional space. Therefore, the area below curve 0 50 0 can be defined as average value of prediction over the R = 10 xˆ0|0 = P0|0 = 0 0 50 entire interval from r = 0 to r = 1 as shown (10). Z 1 p¯ = p(r)dr (10) 5.1 I2R Dataset 0 Fig.4 shows segmentation results for the test image of a There exist 9 different videos in the I2R dataset to which video and the PR curve drawn with respect to these results. tests are applied [38]. The names of these videos are WaterSurface, ShoppingMall, Bootstrap, Campus, Curtain, Escalator, Fountain, Hall, and Lobby 5 Application and Results respectively. In this dataset, there are foreground images of 20 images for each video, which are manually labeled. In the scope of the research, among the known features In the WaterSurface video, there is one person as a that used GMM in literature, tested against the proposed foreground object and this object holds 8% of the whole method. These features are gray level, RGB, HSV, image. There is a wavy sea as a non-stationary YCbCr, Lab and Lab2000HL without any smoothing. In background and the tree in the image is also slowly but order to increase the success rate in a generic background blowing in the wind. Using 20 ground truth images subtraction, some preprocessing applications such as existing in this video, the evaluations of the area under the noise removal and some post processing applications such PR curve indicate that 0.901 success rate is obtained as blob analysis and shadow detection are used. However, when RGB color space is used, 0.979 when Lab2000HL in this application in order to determine the contribution color space is used without any smoothing process of the proposed method, except the required filters in the applied, 0,988 when Lab2000HL is used with the method, neither a preprocessing nor a post processing proposed method. The PR graph of the video is in Fig.5. process is applied. Another video in the dataset is the video called There exist four parameters to be fixed during the Campus. In this video, there are people and vehicles background subtraction method introduced in Section.2. passing in front of trees swaying strongly in the wind. The number of mixture models is set 5, learning The foreground objects occupy between 1% and 6% of coefficient is set 0.001, and the background threshold rate the whole image. is set 0.6. The default value of the class variance required The results belonging to this video are in Fig.6. The for the model is set by applying the method to 16 different area under the PR curve becomes 0.368 when only RGB monotone increasing values separately. Therefore, the color space is used, 0.750 when Lab2000HL is used, and precision and recall values of each variance are obtained. 0.811 when proposed method is applied. The whole results The change in the obtained precision values with respect for other videos in the dataset together with these are stated to the recall values form the PR graph. The area under the in Table.3. curve is calculated with numerical integration. These calculations are shown schematically in Fig.4. The time delay frame number of the temporal filter is set 10. The 5.2 FSM Dataset matrices representing the signal dynamics belonging to the system model and the initial values of the parameters FSM dataset, which is for the first time introduced in this required for the smoothing process are set as follows; article, consists of traffic surveillance camera images c 2014 NSP Natural Sciences Publishing Cor. 1762 M. Balcilar et. al. : Moving Object Detection using Lab2000HL Color... Table 3: Area of under PR curve for all video Methods I2R1 I2R2 I2R3 I2R4 I2R5 I2R6 I2R7 I2R8 I2R9 FSM1 FSM2 FSM3 FSM4 FSM5 Gray Level .906 .666 .643 .343 .853 .589 .605 .783 .222 .763 .772 .698 .641 .630 RGB .901 .708 .700 .368 .861 .600 .614 .796 .237 .792 .810 .711 .668 .726 HSV .926 .764 .757 .638 .650 .340 .551 .651 .397 .820 .840 .483 .684 .665 YCbCr .914 .723 .706 .371 .863 .613 .614 .799 .243 .796 .814 .706 .674 .725 Lab .923 .732 .714 .381 .858 .615 .630 .798 .245 .790 .803 .704 .664 .725 Lab2000HL .979 .842 .744 .750 .797 .584 .584 .824 .376 .852 .837 .673 .680 .712 Proposed .988 .863 .756 .811 .864 .648 .648 .842 .438 .874 .890 .735 .719 .757 including different environments for evaluation of background subtraction methods. This dataset is obtained from cameras placed at Etiler entrance of FSM Bridge in Istanbul within the project in [39]. Including totally 5 different videos, this dataset has 300 images per each video. Ground truth images are obtained from 10 images per each video by labeling them manually. A mask which determines the region to be analyzed (region of interest) is prepared and added to the dataset. The images obtained from the videos are saved in .jpg format and they are in 576x720 resolution. The image saving frequency is 25 fps. There exist camera oscillations, shadows, reflections, sudden and gradual light changes at high rates. The images of FSM1 video are captured in a bright whether at an early morning time. There is not much traffic. The light amount is less and therefore there is less shadow. FSM2 videos on the other hand are captured during a bright day and belong to a lighter environment. Fig. 5: PR Curve of WaterSurface Video in I2R Dataset. Traffic and shadow are a little more than the previous videos. FSM3 is also captured in outdoors but belongs to an environment with so much sun light. The vehicle shadows in this video are so long because of horizontal sun light and almost equal to the area of vehicles themselves. The video FSM4 is on the other hand, captured during morning time in a rainy day. In this video, there is much vehicle traffic but since it is raining the image quality is bad. Wet ground also makes vehicle images reflect. Vehicle reflection from the ground is labeled as the background when the ground truth values are being generated. The problem of image reflections is crucial for especially accuracy of real time traffic analysis. However, in this study there is no solution for this problem. FSM5, which is the last video of the dataset, has images of a rainy day at an evening time. There is so much vehicle traffic and because of the rain there is some reflection from the ground and some noise such as disturbance of camera images. Sample images of the videos in the dataset, the ground truth values and the segmentation results of the recommended method are Fig. 6: PR Curve of Campus Video in I2R Dataset. shown in Fig.7. FSM dataset is an open-source dataset which is available online. 1 Tests applied using the FSM dataset indicate that precision values are not convergent to 1 despite reduced recall values. In FSM1 the precision value converges to 1 1 http://www.yarbis.yildiz.edu.tr/muhammet-page23717 c 2014 NSP Natural Sciences Publishing Cor. Appl. Math. Inf. Sci. 8, No. 4, 1755-1766 (2014) / www.naturalspublishing.com/Journals.asp 1763 Fig. 7: Sample image, ground truth and segmentation result of FSM dataset. investigates only the performance increase in background subtraction due to using Lab2000HL color space together with the defined smoothing processes. In FSM1 video, the performance values of RGB, only Lab2000HL, and proposed method are calculated as 0.7299, 0.792, 0.850 and 0.894 respectively. The PR graph obtained from this video is in Fig.8. In FSM3 video, performance values of RGB, only Lab2000HL and proposed method are obtained as 0.711, 0.673, and 0.735 respectively. The PR graph of this video is also in Fig.9. The areas under the PR curves of both 9 videos of the I2R dataset, and 5 videos of FSM dataset, under 7 different applications are introduced in Table.3. With respect to the computational costs, the methods Fig. 8: PR Curve of FSM1 video in FSM Dataset. are evaluated for images of four different resolutions. In order to perform evaluations, for the features gray level, RGB, HSV, YCbCr, and Lab, color space conversion function of OpenCV library is used. Lab2000HL color space and proposed method are coded in the scope of this almost earlier whereas in FSM3 convergence happens study. Computational cost tests are performed over quite lately. This is because of the systematical dataset involving 1000 images of the corresponding classification errors depending on shadows. In this study resolution. Methods are run on a single core i5 processor, shadow detection is disregarded since the study 2.67 GHz PC. The results of the tests are given in Table 4. c 2014 NSP Natural Sciences Publishing Cor. 1764 M. Balcilar et. al. : Moving Object Detection using Lab2000HL Color... affects the performance worse since it blurs the object borders. Besides performance improving, the calculations of Lab2000HL color space with respect to numerical values with interpolation can be considered to have increased the complexity. In addition to applying convolution process during the spatial filtering, applying different temporal smoother for each color channel of each pixel increases the computational costs significantly. According to Table.4, proposed method needs nearly 0.5 second for processing high resolution image. It means that proposed method do not process high-resolution image in real time at single core processor. It will be possible using special hardware such as GPU or FPGA. However, improves the moving object detection performance at high rates so can be preferred for some applications. Future studies aim to develop shadow detection application using Lab2000HL color space and to use Fig. 9: PR Curve of FSM3 video in FSM Dataset. Lab2000HL color space together with methods such as Kalman filter based background estimation, Kernel Density background estimation, Subspace Learning based Table 4: Average processing times under different resolutions in background estimation and Fuzzy background estimation. terms of milliseconds per frame Feature 128x160 144x176 256x320 576x720 Gray Level 2.2 2.5 7.9 46.0 Acknowledgement RGB 4.2 5.3 16.8 84.2 HSV 4.5 5.6 17.6 90.1 This work is supported in part by the Scientific and YCbCr 4.6 5.6 17.5 90.0 Technological Research Council of Turkey (TUBITAK) Lab 5.4 6.7 20.2 110.5 under the project ’Hybrid models of neural network Lab2000HL 12.4 16.2 49.4 248.8 method for road safety regulations: Safety index Proposed 20.6 27.2 83.2 412.0 calibration and Intelligent Transportation Systems based safety control with no. 108M299. 6 Conclusion References Lab2000HL color space, having two color axes as much [1] S. Cheung and C. Kamath, Robust techniques for background independent as possible of each other, and having color subtraction in urban traffic video in Proc. of Video Comm. value differences of ideal linearity with respect to human and Image Proc., SPIE Electronic Imaging, (2004). perception, has many advantages and has proved its [2] T. Bouwmans, F. El Baf, B. Vachon: Background Modeling success in various applications [30, 31, 32, 33]. In this using Mixture of Gaussians for Foreground Detection - A study, in order to increase the success rate, spatial and Survey, Recent Patents on Computer Science, 1, 219-237 temporal consistency features are used in background (2008). subtraction with GMM. For spatial features, in AC1C2 [3] B. P. L. Lo and S. A. Velastin, Automatic congestion space 7 different filters are used for each channel. For detection system for underground platforms, Proc. temporal features, Kalman filtering based Kalman ISIMP2001, 158-161 (2001). smoothing with time delay which is a different [4] R. Cucchiara, C. Grana, M. Piccardi, and A. Prati,Detecting moving objects, ghosts, and shadows in video streams, IEEE application of Rauch-Tung-Striebel method is applied. Tram on Paftem Anal. and Machine Infell, 25, 1337-1442 It is shown that the recommended method within tests (2003). using I2R and FSM datasets is more successful than the [5] C. R. Wren, A. Azarbayejani, T. Darrell, and A. P. Pentland, classical GMM using well-known color space. The results Pfinder: Real-Time Tracking of the Human Body, IEEE indicate that proposed method has increased the Trans. Pattern Analysis and Machine Intelligence, 19, 780- performance almost in every sample. If only temporal 785, (1997). smoothing been applied, on the other hand, it was more [6] C. Stauffer and W. E. L. Grimson, Adaptive Background efficient for high performance in samples where objects Mixture Models for Real-Time Tracking, Proc. IEEE CS were large and moving slowly, whereas in samples where Conf. Computer Vision and Pattern Recognition, 2, 246-252 foreground objects were small and moving quickly it (1999). c 2014 NSP Natural Sciences Publishing Cor. Appl. Math. Inf. Sci. 8, No. 4, 1755-1766 (2014) / www.naturalspublishing.com/Journals.asp 1765 [7] Zivkovic Z. Improved adaptive Gaussian mixture model [24] W. Wang, R. Wu, Fusion of luma and chroma GMMs for for background subtraction, International Conference Pattern HMM-based object detection, Pacific Rim Symposium on Recognition, 2, 28-31 (2004). Advances in Image and Video Technology, pages 573-81, [8] A. Elgammal, R. Duraiswami, D. Harwood, and L.S. Davis, Hsinchu, Taiwan, (2006). Background and Foreground Modeling Using Nonparametric [25] S. Yang, C. Hsu, Background Modeling from GMM Kernel Density Estimation for Visual Surveillance, Proc. Likelihood Combined with Spatial and Color Coherency, IEEE, 90, 1151-1163 (2002). ICIP, (2006). [9] J. Gu; Z. Liu; Z. Zhang; Novel moving object segmentation [26] N. Setiawan, S. Hong, J. Kim, C. Lee, Gaussian Mixture algorithm using kernel density estimation and edge Model in Improved HLS Color Space for Human Silhouette information, Journal of Computer-Aided Design and Extraction, ICAT 2006, Hangzhou, China, 732-741 (2006). Computer Graphics, 21, 223-228 (2009). [27] F. Kristensen, P. Nilsson, V. wall, (Background [10] J. Wu; J. Xia; J. Chen; Z. Cui. Adaptive Detection of Moving Segmentation Beyond RGB), ACCV 2006, Hyderabad, Vehicle based on On-line Clustering, Journal of Computers, Indian, 602-612 (2006). 6, 2045-2052 (2011). [28] V. Jain, B. Kimia, J. Mundy: Background Modeling Based [11] M. Benalia.; A. Ait-Aoudia. An improved basic sequential on Subpixel Edges, ICIP, 321-324 (2007). clustering algorithm for background construction and motion [29] B. Klare, S. Sarka, Background Subtraction in Varying detection, International Conference on Image Analysis and Illuminations Using an Ensemble Based on an Enlarged Recognition, ICIAR, Aveiro, (2012). Feature Set, OTCBVS 2009, Miami, Florida, (2009). [12] H. Hu; L. Xu; H. Zhao. A Spherical Codebook in YUV [30] M. Balcilar, A.C. Sonmez, The Effect of Color Space Color Space for Moving Object Detection, Sensor Letters, 10, and Block Size on Foreground Detection, SIU2013, Girne, 177-189 (2012). Cybrus, (2013). [13] T. Zhou, T. R. Godec: Randomized low-rank & sparse [31] M. Balclar, F. Karabiber, A.C. Sonmez, Performance matrix decomposition in noisy case, Proceedings of the 28th Analysis of Lab2000HL Color Space for Background International Conference on Machine Learning (ICML-11), Subtraction, INISTA, Albena, Bulgaria, (2013). (2011). [32] I. Lissner, P. Urban, How Perceptually Uniform Can a Hue [14] H. Jimenez. Background Subtraction Approach based on Linear Color Space Be?, 18th Color Imaging Conference Independent Component Analysis, Sensors, 10, 6092-6114, Final Program and Proceedings, 97-102 (2010). (2010). [33] I. Lissner, P. Urban, Toward a Unied Color Space for Perception-Based Image Processing, IEEE Transactons On [15] D. Farcas; C. Marghes.; T. Bouwmans. Background Image Processing, 21, 1153-1168 (2012). Subtraction via Incremental Maximum Margin Criterion: A [34] G.M. Johnson and M.D. Fairchild, A top down description discriminative approach, Machine Vision and Applications, of S- CIELAB and CIEDE2000, Col Res Appl, 28, 425- 435 (2012). (2003). [16] T. Bouwmans; F. El Baf. Modeling of Dynamic [35] R. Cheng, J.S. Liu: Mixture Kalman filters, Journal of the Backgrounds by Type-2 Fuzzy Gaussians Mixture Models, Royal Statistical Society: Series B, 62, 493-508 (2000). MASAUM Journal of Basic and Applied Sciences, 1, 1-10 [36] H.E. Rauch, F. Tung, C.T. Striebel, Maximum likelihood (2009). estimates of linear dynamic systems, AIAA J, 3, 1445-1450 [17] M. Balcilar, A.C. Sonmez, Region Based Fuzzy Background (1965). Subtraction Using Choquet Integral, In Adaptive and Natural [37] J. Davis, M. Goadrich, The relationship between Precision- Computing Algorithms. Springer Berlin Heidelberg, 287-296 Recall and ROC curves, In Proceedings of the 23rd (2013) international conference on Machine learning , 233-240 [18] M. Sivabalakrishnan; D. Manjula. Adaptive Background (2006). subtraction in Dynamic Environments Using Fuzzy Logic, [38] L. Li, W. Huang, I. Y. H. Gu, and Q. Tian, Foreground object International Journal on Computer Science and Engineering, detection from videos containing complex background. In 2, 270-273 (2010). ACM International Conference on Multimedia, 210 (2003). [19] T. Feldmann, Spatio-temporal optimization for foreground/ [39] H. B. Celikoglu, H.K. Cigizoglu, G.E. Gurcanli, Hybrid background segmentation, In Computer VisionACCV 2010 models of neural network method for road safety regulations: Workshops. Springer Berlin Heidelberg, 113-122 (2011). Safety index calibration and Intelligent Transportation [20] D. Pokrajac, L. Latecki, Spatiotemporal Blocks-Based Systems based safety control (in Turkish). Final report of Moving Objects Identification and Tracking, VS-PETS 2003, the project with no. 108M299 supported by the Scientific 70-77 (2003). and Technological Research Council of Turkey (TUBITAK). [21] S. Wang, T. Su, S. Lai. Detecting moving objects from TUBITAK, Ankara, (2010). dynamic background with shadow removal, ICASSP 2011, Prague, (2011). [22] M. Xu, T. Ellis,Illumination-Invariant Motion Detection Using Colour Mixture Models, British Machine Vision Conference BMVC 2001, 163-172 (2001). [23] Y. Sun, B. Yuan, Z. Miao, C. Wan, Better Foreground Segmentation for Static Cameras via New Energy Form and Dynamic Graph-cut, In ICPR ’06, Vol. 4. IEEE Computer Society, Washington, DC, 49-52 (2006) c 2014 NSP Natural Sciences Publishing Cor. 1766 M. Balcilar et. al. : Moving Object Detection using Lab2000HL Color... Muhammet Balcilar A Coskun Sonmez graduated from Yildiz is Professor of Computer Technical University Engineering Department at Computer Engineering Istanbul Technical University, Department, in 2004, and Istanbul. He received the PhD finished M.Sc. and Ph.D. in degree in Computer Science the same department in 2007 at Cambridge University and 2013, respectively. He is (UK). His main research curently working in the same interests are: Real Time department as a research Computers, Microprocessors, assistant. He is a member of Stochastic Robotic Research Artificial Inteligence, Expert Systems and Robotics. Group and Computational Intelligence Research Group. Mathematical Modelling, Signal & Video Processing, Robotic and Optimization are his main research areas. M Fatih Amasyali is Assistant Professor of Computer Engineering Department at Yildiz Technical University, Istanbul. He received the PhD degree in Computer Science at the same university. He is a member of Stochastic Robotic Research Group and Linguistic Research Group. His main research interests are: Natural Language Processing, Expert Systems, Artificial Intelligence and Robotics. c 2014 NSP Natural Sciences Publishing Cor.