A Novel GA-SVM Model For Vehicles And Pedestrial Classification In Videos

The paper presents a novel algorithm for object classification in videos based on improved support vector machine (SVM) and genetic algorithm. One of the problems of support vector machine is selection of the appropriate parameters for the kernel. This has affected the accuracy of the SVM over the years. This research aims at optimizing the SVM Radial Basis kernel parameters using the genetic algorithm. Moving object classification is a requirement in smart visual surveillance systems as it allows the system to know the kind of object in the scene and be able to recognize the actions the object can perform.


Introduction
Object classification in videos is the process of recognizing the classes of objects detected in videos. It is an important requirement in surveillance systems as it aids understanding of the intentions or actions that the object can perform. For instance human beings can sit, walk, run or fall while vehicles can move, run, over-speed or crash. Object classification is a challenging task because of various object poses, illumination and occlusion [2].
Recently, many research works have been carried out in literature on object classification in videos. Heikkila and Silven, [7], presents a real-time system for monitoring of cyclists and pedestrians. The classification algorithm adopted is learning vector quantization however, the classification accuracy obtained is low. The classification rate is low. The authors in [5] classified moving object blobs into general classes such as 'humans' and 'vehicles' using neural networks. Each neural network is a standard three-layer network. Learning in the network is accomplished using the back propagation algorithm. Input features to the network are a mixture of imagebased and scene based object parameters namely image blob dispersednes (perimeter 2 /area (pixels)); image blob area (pixels); apparent aspect ratio of the blob bounding box; and camera zoom. There are three output classes, namely human, vehicle and human group. This approach fails to discriminate object with similar dispersednes. The authors in [9] used Artificial Neural Network approach for the classification of human motion on a still camera. It is noted that task to classify and identify objects in the video is difficult for human operator. Object is detected using background subtraction technique. The detected moving object is divided into 8x8 non-overlapping blocks. The mean of each of the blocks is calculated. All mean value is then accumulated to form a feature vector. A neural network is trained using the generated feature vectors. Experiment performed shows a good classification rate but the object detection algorithm used cannot work under a quasistationary background. Not only this, the computational time of the features is time consuming which is a problem to surveillance systems. In [2], a neuro-genetic model for moving object classification in videos is presented. A genetic model is used to obtain optimum weights of a neural network. The optimum weights are used later by the multilayer feed-forward neural network model to classify objects as human or vehicle. The model is compared with a neural network trained using back-propagation algorithm on a set of objects detected from real life videos. The neuro-genetic model outperforms with classification rate of 99.09% while the backpropagation neural network achieves the classification rate of 98.5% .
In [11] SVM is used to recognize facial expressions in videos. Automatic facial feature trackers are used to locate faces in videos. Features are then extracted which are then supplied to SVM to recognize facial expression. The classification accuracy shows that SVM is capable of recognizing facial expressions. In [12], SVM is used to recognize detected objects in video for tracking purposes. A simple background subtraction technique is used to extract the object. Moment features of the detected objects are calculated and fed into SVM for classification. In [11] and [12] more classification accuracy could have been achieved by optimizing the parameters of the SVM. The hybrid of GA and SVM have been used in [13] for classification of satellite images. There is the need to improve the object classification in video surveillance applications.
Recently SVM have been reported as an efficient classifier. SVM is based on the statistical learning method based on structural risk minimization instead of the empirical risk minimization to improve the generalization ability of a model. It is however realized that the selection of appropriate kernel and kernel parameters can have great influence on the performance of the model [13]. The approach that have been used for optimizing the hyper-parameters of the SVM is the grid search which is always time consuming and does not perform well [13]. In this paper, a genetically optimized support vector machine is presented for human object classification in videos.

2
Support Vector Machines (SVM) Support Vector Machines are a set of supervised learning methods used in classification and regression. This machine learning technique was proposed by Vapnik in 1995. The classification problem can be restricted to consideration of the twoclass problem without loss of generality. In this problem, the goal is to separate the two classes by a function which is induced from available examples [16]. The motivation for SVM is to create a classifier that will work well on unseen examples, that is, generalizes well. The objective is to create a classifier that uses the structural minimization principle as against those that use empirical error minimization principle. Consider the example in Figure 1. There are many possible linear classifiers that can separate the data, but there is only one that maximizes the margin (maximizes the distance between it and the nearest data point of each class). This linear classifier is termed the optimal separating hyper-plane. Intuitively, this boundary is expected to generalize well as opposed to the other possible boundaries. Let m-dimensional training input vectors, x i (i=1,…,m) belong to class 1 or 2 and the associated labels be y i =1 for class 1 and -1 for class 2. If these data are linearly separable, a decision function satisfying Equation 1 can be constructed (1) where w is an m-dimensional vector and b is a bias term.
If the training data is linearly separable, then no training data satisfy: This hyper-plane should have the best generalization capability. As shown in Figure 1, the +1 and the -1 are the training dataset which belong to two classes. The plane H series are the hyperplanes that separate the two classes. The optimal plane H is found by maximizing the margin value || || / 2 w . Hyperplanes 1 H and 2 H are the planes on the border of each class and also parallel to the optimal hyper-plane H. The data located on 1 H and 2 H are called support vectors [8].
separates the data into classes such that: These constraints can be expressed in compact form as: which can be written as: It has been shown that if no hyper-plane exists (because the data is not linearly separable), a penalty terms i  is added to account for misclassifications.
This can be translated to the following minimization problem: minimize: where C, the capacity, is a parameter which allows us to specify how strictly the classifier can fit the training data. This can be translated into the following dual problem: maximize: x x    (8) subject to: The threshold b of the optimal separating hyper-plane is obtained by: The prediction of new patterns is given by x Class(  The training samples for which the Lagrange multipliers are non-zero are called support vectors. Samples for which the corresponding Lagrange multiplier is zero can be removed from the training set without affecting the position of the final hyperplane. The classification framework outlined above is limited to linear separating hyperplanes. It is possible however to use a non-linear hyper-plane by first mapping the sample points into a higher dimensional space using a non-linear mapping. That is, by where the dimension of  is greater than .
n A separating hyper-plane is then found in the higher dimensional space. This is equivalent to a non-linear separating surface in n  . The data only ever appears in our training problem in the form of dot products, so in the higher dimensional space the data appears in the form . If the dimensionality of  is very large, this product could be difficult or expensive to compute. However, by introducing a kernel function such that: Equation 13 can be used in place of j i x x . everywhere in the optimization problem and never need to know explicitly what  is. The development of a SVM classification model depends on the selection of kernel function K. There are several kernels that can be used in SVM models. These include linear, polynomial, Radial Basis Function (RBF) and sigmoid function. The Radial Basis Kernel is given by: The RBF is by far the most popular choice of kernel types used in SVM. This is mainly because of their localized and finite responses across the entire range of the real x-axis. After solving for w and b, the class a test vector t x belongs to is determined by higher dimensional space has been used. It can be shown that the solution for w is given by: Thus, the Kernel function can be used rather than actually making the transformation to higher dimensional space since the data appears only in dot product form. The σ and C are the free hyperparameters the users can supply. These two parameters has a great role to play in the performance of the SVM. Several choices have used to select optimal values of these parameters. These include cross-validation, Particle swarm optimization and genetic optimization. In this paper, however, the genetic program approach is employed.

3.1
Genetic Algorithms Genetic Algorithm (GA) was developed by John Holland in 1975. The algorithm is based on the mechanics of the natural selection process (biological evolution). The concept of the GA is that the strong tends to adapt and survive while the weak tends to die out [17] The technique is about generating a random initial population of individuals, each of which represents a potential solution to a problem [14]. The process begins by coding the genes in a form that represents the solution to the particular problem. In Holland's Genetic Algorithm, each feasible solution is encoded as a chromosome (string of zeros and ones) also called a genotype. Then initial population size is specified and a number of chromosomes of the population size are then created. Optimization is based on evolution, and the "Survival of the fittest" concept. Each chromosome is given a measure of fitness via a fitness (evaluation or objective) function. The fitness of a chromosome determines its ability to survive and produce offspring [17]. As reproduction takes place, the crossover operator exchanges parts of two single chromosomes to produce children and the mutation operator changes the gene value in some randomly chosen location of the chromosome [10]. The process of evaluation, selection, and recombination is iterated until the population converges to an acceptable solution.Genetic Algorithms (GA) has been applied in various areas of computer vision such as weights optimization of artificial neural networks [2;10;14;15], video segmentation [4].

SVM parameter selection based on Genetic Algorithm
Using the Genetic algorithm, the two parameters C and σ of SVM model can be optimized. The process of optimizing these parameters is shown in Figure 2. a. Initialization. To start with, the initial population is made up of chromosomes chosen at random or based on heuristically selected strings. Population size affects the efficiency of the performance of a GA. The first step in GA implementation is the determination of a genetic encoding scheme, that is, to denote each possible point in the problem's search space as a characteristic string of defined length. The genes represent the value of σ and C and are concatenated to form a chromosome. Several of these chromosomes are randomly generated from the network architecture to form the initial population of the solution space. Figure  3 shows a typical chromosome. In this research work, C and σ values of the RBF are simultaneously coded as one gene. Each of the gene is coded using randomized binary numbers of length 5. The minimum value is 0.1 while the maximum value is 100. The number of bits to represent the value of a gene must satisfy: where v is the string length, min is the minimum value of the gene, max is the maximum value of the gene and ∆ is the error tolerance.
A chromosome represents one RBF. The length of chromosome is obtained by concatenating the bits representing each gene as shown in Figure 3.

b.
Fitness Function by Cross-Validation. When GA is applied to solve a problem, the definition of the evaluation function to evaluate the problem-solving ability of a chromosome is important. Since SVM works by classification, the classification rate is used as the fitness function. The Accuracy is computed as: c. Perform Selection. The fitness of the new offspring is calculated and sorted in the descending order. So, chromosomes of highest fitness values are selected for the next generation. In this research work, the roulette wheel method is adopted for selection. The probability of selection is given by: in which; f i is the fitness value of individual i, f sum is the total fitness value of population; P i is the selective probability of individual. It is obvious that individuals with high flexibility values are more likely to be reproduced during the next generation.
d. Perform Crossover. In this research work, one-point crossing is adopted. The specific operation is to randomly set one crossing point among individual strings. When crossing is executed, partial configuration of the anterior point and posterior point are exchanged, and this gave birth to two new offspring e.
Mutation. As for two-value code strings, mutation operation is to reverse the gene values within a random number generated between zero and one. f.
Termination Criterion: The termination condition is the maximum number of generations. Genetic control parameters dictate how the algorithm will behave. Changing these parameters can change the computational result. These parameters are population size, crossing probability, mutation probability and network termination condition. In this work population size N is 50, crossing probability P c is 0.8, mutation probability P m is 0.5, and network's terminative condition is MAXGEN of 100.

4
Object Classification using the proposed GA-SVM model

Data Acquisition
The data used for this research work are obtained from videos of moving objects taken in Nigeria roads. The moving objects are detected using background subtraction algorithm and the features are extracted from the silhouettes of the detected objects.

Object Segmentation
IJTSRD | May-Jun 2017 Available Online @www.ijtsrd.com Kernel Density Estimation (KDE) is the mostly used and studied nonparametric density estimation algorithm. The model is the reference dataset, containing the reference points indexed natural numbered and has been used in [6] for foreground detection. The algorithm assumed that a local kernel function is centered upon each reference point and its scale parameter (the bandwidth). The common choices for kernels include the Gaussian and the Epanechnikov kernel. The algorithm is presented as follows. Let , , … , , ϵ be a random sample taken from a continuous, univariate density f, KDE is given by: k(.) is the function satisfying : k(.) is refered to as the Kernel, h is a positive number, usually called the bandwidth or window width. The Gaussian Kernel is given by: The Epanechnikov kernel is given by: where d is the dimension of feature space, is the volume of the d-dimensional sphere.
KDE for background modeling involves using a number of frames (training frames) to build the probability density of each pixel location. The adaptive threshold of each pixel is found after the construction of the histogram.
For every pixel observation, classification involves determining if the pixel belongs to the background or the foreground (as shown in Figure 4). The first few initial frames in the video sequence (called learning frames) are used to build histogram of distributions of the pixel color. No classification is done for these learning frames, but for the subsequent frames depending on whether the obtained value exceeds the threshold or not. If the threshold is exceeded, background classification is done, otherwise foreground classification. Typically, in a video sequence involving moving objects, at a particular spatial pixel position a majority of the pixel observations would correspond to the background. Therefore, background clusters would typically account for much more observations than the foreground clusters. This means that the probability of any background pixel would be higher than that of a foreground pixel. The pixels are ordered based on their corresponding value of the histogram bin which relies on the adaptive threshold in the previous stage. The pixel intensity values for the subsequent frames are estimated and the corresponding histogram bin is evaluated with the bin value corresponding to the intensity determined.

Feature Extraction
The blobs extracted from videos used to extract the feature vectors which were are then normalized by dividing the vector by the sum of the lengths. These vectors are then used to train the support vector machine.
Instead of segmenting the silhouettes into 8x8 nonoverlapping blocks as shown in [9], radial features are directly calculated from the detected objects as shown in Figure 4. This captures the shape and a series of the features encodes the motion information.  The distance from the centroid to its nearest edge along a predefined angle is then stored. This is done for a set of angular vectors. The dimension of each vector equals the number of axes being projected the centroid. The vector is then normalized to ensure that the vector is scale invariant and the largest value in the vector will at this point be 1.0, which will be the longest of the axes projected.  The distance from the centroid to its nearest edge along a predefined angle is then stored. This is done for a set of angular vectors. The dimension of each vector equals the number of axes being projected from the centroid. The vector is then normalized to ensure that the vector is scale invariant and the largest value in the vector will at this point be 1.0, which will be the longest of the axes projected. Figures 5a and 5b features from human and vehicles D distance feature vector from Humans (left) and vehicles (right) The number of lines in each image containing an object as well as the number of layer is j where j = 1, 2, 3, …., n and n is given by n = 360/ where is the smallest of the angles. Angular size of 10 degrees interval is used to obtain 36 regions beginning with 10 these regions were selected as feature vectors. for more details of the segmentation algorithm.
) is the pixel value of the object.
The number of lines in each image containing an object as well as the number of neurons in the input layer is j where j = 1, 2, 3, …., n and n is given by n = is the smallest of the angles. Angular size of 10 degrees interval is used to obtain 36 regions beginning with 10 0 . Thirty two (32) of were selected as feature vectors. See [1] for more details of the segmentation algorithm.

GA-SVM Object Classification
In this work, the training of the SVM consists of providing the SVM with data for vehicles and pedestrians. Data for each class consist of a set of n dimensional vectors. A Radial Basis Function (RBF) kernel is applied to the SVM which then attempts to construct a hyper-plane in the ndimensional space, attempting to maximize the margin between the two input classes. The SVM type used in this work is C-SVM using a non-linear classifier where C is the cost hyper-parameter [2]. The SVM is trained using 1D radial signals extracted from the silhouettes of the labeled images of humans and vehicles. Given a training set of human and vehicle images consisting of multiple labeled images of each human and vehicle classes an SVM classifier is trained as follow: 1D features data are extracted from the training set images to create the vector X = (x 1 , x 2 ,...,x L ) for human and Y = (y 1 ,y 2 , ..., y K ) for vehicle images. where L and K are the total number of training images for class human and vehicle respectively. To train the SVM, X = (x 1 , x 2 , ..., x L ) are used as the positive labeled training data and Y = (y 1 ,y 2 , ...,y L ) are used as the negative labeled training data. The SVM is then trained to maximize the hyperplane margin between their respective classes (Φ 1 , Φ 2 ). To classify an object F, it must be assigned to one of the p possible classes (Φ1, Φ2), in this case p takes two values. For each object class, Φp, an SVM is used to calculate this class that the object F belongs to given measurement vector D where in this case D is the set of 1D radial features.

Implementation of the Proposed Model
In order to evaluate the classification performance of the proposed SVM model, some video data were gotten from major roads in Akure, Nigeria. The background subtraction algorithm was applied to detect objects from the videos. Features of these objects are exccted and stored in a database. The data are divided randomly into the 80 training dataset 333 test dataset. The 80 training dataset consist of 40 vehicles and 40 pedestrians. The test dataset consists of 100 vehicles and 223 pedestrians. The class vehicle is assigned 1 while the class human is assigned 0. The GA-SVM model is trained using [3]. In order to compare the classification performance of the proposed model with other classifiers such as the normal SVM, K-Means, K-NN, ANN, GA-ANN, the training is done with the same set of data and the same testing dataset is used to evaluate each classifier. In the proposed GA-SVM model, the SVM adopts Radial Basis Function as the kernel. The parameters C and γ of the SVM are optimized using Genetic Algorithm. Then these optimal parameters are used to train the SVM model. In the normal SVM model, the SVM use RBF as the kernel function. The parameters C and γ are randomly selected. In the K-Nearest Neighbour training, data for vehicles and pedestrians are provided for the K-NN. As many instances of vehicles and pedestrians are provided with their corresponding class labels. After the training is done, K-Nearest can then be used to classify test instances. To classify a new instance, the instance is supplied and the number of neighbors k. This k defines the neighborhood in which training data is consulted. The test data is compared with the training data. The nearest K-instances are checked and the majority nearest to k dictates the class that the instance belongs. In this research, k is set to 5. For the Classification Using K-Means Algorithm, Given a training set of human and vehicle classes, a K-Means clustering algorithm is trained using the vector X = (x 1 , x 2 ,...,x L ) for human and Y = (y 1 ,y 2 , ..., y L ) for vehicle data. where L is the total number of training images. To train the K-means, the number of classes and the mean of each class is provided. K-Means now clusters each data item to any of the classes based on the Euclidean distance of the data to the mean of each cluster. The minimum distance from each class determines which cluster the data falls. Now after the entire instance are clustered, the means are reevaluated again and the process continues until a given terminating condition is fulfilled. To cluster a new data, the distances of the new data to each of the evaluated cluster means are calculated and the one with minimum is the class of the new data. Object classification using ANN uses input nodes, which are 32 in number, the hidden layer which has 32 neurons and the output layer which has one neuron. The following steps are adopted in the Neural Network modeling for object classification. In this approach, shape features extracted from the detected moving objects are used to train a neural network classifier in order to recognize human and vehicle classes. The Learning Algorithm used is the back propagation algorithm. The dimension of input data is thirty two.
The network has one hidden layer having thirty two neurons and one output neuron. The network is trained using the back-propagation. supervised learning algorithm in which the input vectors are supplied together with the desired output. The back propagation algorithm (BPN) the training epoch. Several epochs are needed before the network can sufficiently learn and provide a satisfactory result. The number of epochs used is 500, with the momentum of 0.5 and learning rate of 0.3. The initial weights are randomly initialized to small random numbers less than 1 using random number generators. The object classification using using neuro-genetic model (GA-ANN) applies genetic algorithm to optimize the weights before the ANN is trained with the optimized weight. In this case, there are (32*32+32) number of weights which signifies a chromosome. The population size N is 50, crossing probability P c is 0.8, mutation probability and network's terminative condition is MAXGEN of 100. This combination gives an excellent empirical performance.

6
Results and discussion   www.ijtsrd.com layer having thirty two neurons and one output neuron. The network is . This is a supervised learning algorithm in which the input vectors are supplied together with the desired output. The back propagation algorithm (BPN) learns during the training epoch. Several epochs are needed before the network can sufficiently learn and provide a The number of epochs used is 500, with the momentum of 0.5 and learning rate of y initialized to small random numbers less than 1 using random The object classification using ANN) applies genetic algorithm to optimize the weights before the ANN is In this case, there which signifies a opulation size N is 50, crossing mutation probability P m is 0.015 and network's terminative condition is MAXGEN of n excellent empirical the analysis result for GA-SVM, NN. GA-SVM. For SVM, 80 data are used to train the model.. The support vectors' values from the training are then The gamma parameter obtained is 19.28 and For testing, 333 data items pedestrian data . Out of the 110 vehicles, all are classified correctly while for 223 human class data items, only 2 classification is 0.61% and the normal SVM, error in is 0.9% and the accuracy is 99.1%. For is 0.9% and the MEANS, error in is 1.2% and the accuracy is 98.8% and is 1.5% and accuracy classification is 4.

Conclusion
In this paper, SVM with GA is applied to object classification in videos. In the proposed model, an optimized RBF kernel function and parameter C of SVM is obtained. Video data of moving objects have been used to evaluate the model. The experimental results show that the proposed classifier performs better than the SVM in which parameters are chosen randomly. The comparative analysis of the model with that of SVM, GA-ANN, K-MEANS, K-NN.ANN shows that the model shows more excellent performance in terms of classification accuracy. In future other optimization techniques will be looked into in order to investigate the performance of the SVM model on other optimization techniques.