RESEARCH ON REGIONAL CLUSTERING AND TWO-STAGE SVM METHOD FOR CONTAINER TRUCK RECOGNITION

. With large-scale, integrated, intelligence for ports, many ports begin to use intelligent detection systems to make their operations more eﬃcient. The container truck recognition and positioning system is also beginning to apply into container quayside to assist the joint operations between quay cranes and container trucks. However, the traditional vehicle detection by using motion region detection cannot recognize the type of moving object, and the tra- ditional pattern recognition method cannot meet the requirements in real-time operation. In order to solve these problems, an algorithm fused by regional clustering and two-stage SVM classiﬁer is proposed in this paper. The method consists of two phases, which are independently executed in two camera systems on quay cranes. In the ﬁrst stage, a fast motion regional clustering algorithm is used to detect moving image patches as the truck candidate sub-windows. In the second stage, the container trucks will be recognized in these sub-windows by an optimized two-stage SVM classiﬁer. Compared with existing traditional algorithm, experimental results in container terminal show that the fusion algorithm with regional clustering and two-stage SVM has higher eﬃciency and better truck recognition performance.


1.
Introduction. In container terminals, more and more intelligent systems are applied to make operations more efficient [9,17]. Usually, the container truck should park themselves to the suitable positions for quay cranes loading and unloading containers. As shown in Figure 1, according to the container handling design, the quay crane stop itself facing to the bay of the ship that will be loaded. During the whole period to load the same bay, the quay crane never move itself again. Therefore, Figure 1. A quay crane was unloading container from a truck the container trucks should park themselves at suitable positions underneath of the hoists for quay crane to load or unload containers.
In the past, the drivers parked the container trucks to the correct positions with the guiding by workers at quaysides. Those guiding operations may cost much time and have bad effect on handling performance. Therefore, detection and positioning of quayside container trucks has become one of the hot engineering problems in the port logistics. In recent years, the researchers and engineers try to seek and develop a suitable and real-time method to detect and position container trucks precisely and quickly [16].
The vehicle recognition and position is always a hot research area. An increasing number of researchers and engineers try to different ways depending upon different requirements to detect and position vehicles.
Qifan Wei and Hongmei Zhu et al used magnetic sensors to detect and track vehicles in the roads [22,28]. This kind of methods show good stability and recall factor. However, only using magnetic sensors cannot differentiate vehicle types. In the quayside, not only container trucks but also other vehicles such as bus and pickup trucks pass through the quay cranes. Therefore, these methods may have some detection errors for that the other vehicles may be recognized falsely as container trucks. Radars are also a choice to apply to port engineering. For example, Mi C et al. applied the radar into ship identification for automatic ship Loader [11]. B Li et al. propose the vehicle detection method based on the 3D Lidar by convolutional network [10]. Fernando García and Xiao Wang et al set out methods that fused laser radars, millimeter-wave radars, GPS and vision systems to detect and position vehicles [3,21]. This kind of methods show very effective on vehicle detection and positioning, in addition, they can also recognize a part of vehicle types. However, these methods are complex and cost much, and port end users cannot accept highly uneconomical and complex solutions.
Recently, with the development of the computing hardware, the vision systems begin to be a good selection for port engineering. For example, Mi C et al. designed a human detection system and a container corner casting recognition system for security in ports [12,13]. In traffic engineering, the vision systems are also a good selection for vehicle detection and positioning. Jemej Mrovlje developed a method with stereoscopy to cluster the large elements to be detected and remove the small objects for truck detection in the port [14]. But this method cannot recognize the concrete type of vehicle. Morteza Jalalat mixed several algorithms to detect, position and estimate displacements and speed of vehicle [5]. Mixed algorithms may be accuracy but sometimes very slow because several algorithms are combined in the same system. Xue Yuan, Lars Wilko Sommer, Shiva Kamkar and Yunsheng Zhang et al all proposed the methods through background detection and motion regions tracking to achieve the vehicle detection, recognition and positioning [4,7,15,19,25,26]. These methods are very fast and can track any motion regions in the images. Nevertheless, the vehicle type recognition may be not accurate in these methods because of the very limited information of the vehicles. In order to be able to accurate classify vehicle types, pattern recognition methods were proposed by many researchers and engineers. Y Cai proposed a monocular and binocular vision method based on Deep Convolutional Neural Networks to search vehicle for the whole area of image [1,2,18,27]. Jie Yuan, Thanida Tangkocharoen, Zhaojin Zhang, Q Jiang and Hulin Kuang et al used multi-layer classifiers, deep learning classifiers or neural networks to recognize the vehicles [6,8,20,23,24]. These methods used sliding and multiscale windows to scan the whole image and find the vehicle targets. The way scanning the whole area of image leads to sufficient precision but also be slow in these methods. Therefore, the real-time computing performance cannot be satisfied.
In conclusion, as the future of truck recognition in port, the vision system in vehicle recognition mainly includes two kinds of algorithms: motion region detection and pattern recognition. However, the traditional vehicle detection by using motion region detection cannot recognize the concrete type of moving object, and the traditional pattern recognition method cannot meet the requirements in real-time operation in port. Thus, a fast truck recognition method combined with regional clustering and two-stage classifier is proposed in this paper. The method consists of two phases, which are independently executed in two camera systems on quay cranes. In the pre-search stage, the image of the whole lane can be collected through the front camera on the quay crane. Then a fast motion regional clustering algorithm can be used to detect moving objects as truck candidate sub-windows. After that, the whole image can be divided into several sub-windows as the truck detection candidate windows. In the second stage, if there are any candidate windows found, another camera will begin to recognize container trucks in those corresponding window images through an optimized two-stage SVM classifier. Finally, the multi-scale recognition results can be fused to obtain the precise center position of container truck. Compared the other methods, the experimental results show that our method can detect, position and recognize the container trucks more efficient.
2. Fast detection of truck position.
2.1. Camera installation and framework of algorithm. Referring to camera installation above each lane, there are two cameras installed on the opposite beams of the quay crane. As shown in Figure 2, the pre-search camera and truck detection  . The field installation of the camera camera monitor the overlapping area. The detection of the two cameras can all range from entrance to exit of truck in the corresponding lane. Only the pre-search camera can keep working all day normally, while the truck detection one is always in a standby sleep mode to save computing resources until the pre-search camera send active messages to truck detection camera. Figure 3 is the field installation of the camera. When the truck is moving across the lane, the pre-search camera will scan the image quickly to find candidate sub-windows. If the moving objects is detected in video sequences, the pre-search camera will send the information of candidate windows with the moving objects. After receiving the information, the truck detection camera will change the mode from sleep to active. Then the truck position will be detected accurately in the corresponding image sub-windows from the truck detection camera. The detailed process of truck detection is as follows: Our method is expressed by two components: fast truck pre-search and detection of truck position, and the considered system includes two cameras. As shown in Figure 4, the purpose of pre-search camera is to select fewer candidate sub-windows. At first, the background image can be generated by Gaussian mixture modeling from the image sequences. Then the image patches of moving objects will be extracted through background subtraction method. Next, these image patches are clustered by k-means clustering and get the outer rectangular frames of clusters. Through filtering these rectangular frames, the candidate sub-windows can be selected based on some parameters conditions. After that, the process of truck detection camera will be activated. The next steps are all processed in the corresponding candidate windows. The HOG feature will be extracted to train the two-stage SVM classifiers model. Then the truck with two kinds of shapes can be detected in the candidate slide windows with the two-stage SVM classifiers. Finally, the prediction results of the two-stage SVM classifiers will be fused and the fusion result is the final coordinate of truck position.

2.2.
Fast truck pre-search algorithm. Generally, the shape of truck head is uniformed in the same terminal, but the shape of truck trailer has great difference between empty or equipped with a container. So the truck detection also means the truck head detection.
In the actual situation of positioning the truck, the traditional method of truck detection is to scan the whole image to search the target truck, which will cost too much time and hardware resources. Thus, the fast truck pre-search algorithm is proposed in this paper to find out the possible position regions of truck in the original image, which can save a lot of time for two reasons: • The pre-search algorithm can find out a truck position range roughly to avoid scan the whole image. • The pre-search algorithm can pick out the adaptive classifier according to the per-search results, which can avoid the resources waste of the simultaneous detection for the two-stage classifiers. Therefore, the background subtraction method is applied as the fast truck presearch method in this paper. This method compares the real-time monitoring and background to find out the possible truck region. The detail process of fast truck pre-search algorithm is as follows:

Background extraction and update
First of all, the original RGB image should be converted to grayscale images. Then the background can be extracted by Gaussian mixture modeling method as shown in Figure 5. In order to adapt the multimodal background scene, the Gaussian mixture model is built by using the weighted sum of K Gaussian distributions. Thus, it can deal with the complex background scenes, which can be described by Eq. 1.
Where, means the weight of Gaussian distribution at time t and K is the number of distributions. η X t , µ i,t , δ 2 i,t , µ i,t , δ 2 i,t is the probability density function, the mean value and variance, and η X t , µ i,t , δ 2 i,t can be calculated by Eq. 2.
2. Background subtraction Through the subtraction of background frame and the current frame, if the pixel value alteration is very small, there is no moving object. Otherwise, there is a truck possibly in the corresponding region. The method can be Where, D(x,y) is the pixel gray value of target frame, f k (x, y) and f b (x, y) means the pixel values of point (x,y) at time k frame and background frame, T represents the threshold value.
As shown in Figure 6, the result of background subtraction contains many small image patches, even not belong to the target truck. Therefore, the result of this step need to be processed further. 3. Regional clustering For large image data, the traditional clustering algorithm cannot be effectively dealt with, which is due to the traversal operators at all pixel points of the image. Different from the traditional clustering, the regional clustering algorithm is proposed in this paper. First, the white points of the same connected region are considered as one element. Each element contains a series parameters of attribute such as center point coordinate, the connected region area and the radius of outer circle. Then, the large amount points of image data can be mapped to low-dimensional space. In the lower-dimensional space, these image patches can be clustered into several categories based on kmeans algorithm. Utilized the method of Euclidean space weighted distance, the regional clustering is evaluated by criterion function of minimizing the error sum of the squares of results, which can be calculated by Eq. 4.
Where, k is the number of categories, X i is the collection of image patches nearby the cluster center m i of cluster i.
As shown in Figure 7, these image patches are clustered into two categories: truck and non-truck objects. For each clustering result, the minimal outer In the Figure 8

Candidate windows selection
After image patches clustering, some non-truck objects might also appear in the truck images, such as operators, cars or other moving objects. Whether these rectangular frames are reserved as candidate sub-windows with the selection criteria including the size of area, width, height and angle. However, these selection criteria only can find the candidate sub-windows approximately without a sufficient accuracy. Therefore, the next stage will detect truck position more accurately in these candidate sub-windows.

Detection algorithm of truck position.
After the process of pre-search camera, the detection of truck position will be completed by the truck detection camera, as shown in Fig. 1. According to the candidate sub-windows in the previous step, the corresponding regions in the truck detection camera will be processed by sliding windows. However, the two cameras are installed at opposite positions what means monitor the overlapping area. The images from different cameras are symmetrical with each other. Thus, the image coordinates of truck detection camera need to be converted to keep consistent with the other one before the next steps.
Finding the corresponding candidate sub-windows approximately, the truck detection camera will detect the truck position by three steps: Local feature extraction, Classifier model training and results fusion. Firstly, the HOG features will be extracted in this paper, and then the SVM classifier model will be trained and the truck position can be predicted. Finally, the results from each slide windows will be fusion as the coordinate of truck position. Figure 6, feature extraction can be divided into five steps: gamma correction operation, the gradient calculation, HOG (Histogram of Oriented Gradient) descriptor calculation, feature normalization and feature vector combination.

Local features extraction. As shown in
1. Image noise is easy to appear in truck image because of the poor environment in container terminal, which can be suppressed by a square root compression gamma correction operation. The standardized pixel values of three channels, r x,y , g x,y , and b x,y can be calculated using Eq. 6: Where, r x,y , g x,y , and b x,y are the original pixel values of the red, green, and blue channels, respectively, and the subscript x, y represents the pixel coordinates.
The gradient magnitude and gradient direction can be calculated by a one dimensional discrete differential template according to Eq. 7 and Eq. 8 ∇f (x, y) = G 2 x (x, y) + G 2 y (x, y), θ (x, y) = tan −1 (G y (x, y) /G x (x, y)) Where, ∇f(x, y) and θf (x, y) are the gradient magnitude and gradient direction of points (x,y), respectively. G x (x, y), G y (x, y) indicate the component of the gradient magnitudes of pixel points in the x and y axis. 2. According to Figure 9, the whole image is scanned by a pixels sliding block, which is divided into four pixels cells. The gradient direction of each point in a cell is discretized into 9 bins in order to extract the cell HOG feature vector using a tri-linear interpolation method. A 36-dimensional HOG feature vector can be extracted in each block. 3. The HOG feature of each block needs to be normalized by L2-Norm function to suppress the influence of the background illumination and edge 4. Finally, all the entire 63 blocks of the whole image are connected to form a 2268-dimensional feature vector. After the steps above are complete, the truck HOG feature is obtained as shown in Figure 10. However, in actual image, the shape feature will be very different when the truck is located in the upper or lower half of the image. Thus, two sets of sample data are required to be trained for different shapes of trucks as it is presented in Figure  10.

2.3.2.
Classifier model: Training and prediction. In this work, we use the SVM classifier to deal with truck position. For truck position detection, the size of sample image is 80 × 64 pixels with multi-dimensional feature space. Comparing to the traditional single SVM classifier, the different position of truck needs two-stage SVM classifiers proposed for truck detection. As shown in Figure 11, the two-stage SVM classifiers need to find two hyper-planes H and H ) to classify positive and negative samples correctly. The distance between plane H 1 (H 1 ) and plane H 2 (H 2 ) is called the 1maximum margin, and the features of the sample points on planes H 1 (H 1 ) and H 2 (H 2 ) are support vectors. The equation of the hyper-plane can be formulated with Eq. 9.
During the preprocess stage, the two-stage SVM classification algorithm firstly divides the truck samples into three classes: the truck in the lower half of the image, the truck in the upper half of the image, non-truck as shown in Figure 12. Firstly, each sample selected from the truck images need to be made a binary decision: the upper or lower half of the image, as shown in Figure 13. Then the two-stage classifiers can make the samples divided into truck or non-truck respectively in two cases. Finally, the possible truck area leads the appropriate classifier to detect the truck position with corresponding parameters.
As a supervised algorithm, SVM need to make the sample sets marked with corresponding labels. The training steps of classifiers are following presented: 1. Preprocessing According to the images captured in the truck lane, the truck is kept by a window size of 80x64 pixels, as positive samples. Of course, the upper half area truck and the lower truck will be the positive samples of the two-stage    Figure 13. Besides Figure 13. The sample sets of two kinds of truck shape features the truck, the other part of the images can be cut as negative samples of twostage classifiers. Then the HOG feature extracted from positive samples will be labeled as, and the negative one will be labeled as.

Initial SVM classifier training
After the HOG feature are extracted, the 2268-dimensional features will be mapped to a higher-dimensional feature space by using Radial Basic Function (RBF), as shown in Eq. 10.
Here, x is a feature vector and x c is the position of the center of kernel function. The width parameter σ can control the radial scope of impact from the kernel function.
After that, the HOG feature space of samples can be linearly separated. Through objective function and constraint function, the two hyper-planes H and H of initial SVM classifier can be obtained by Eq. 11.
Where, c and ζ i represent the punishment factor and slack variables, respectively, which can increase the fault tolerance of the SVM classifier. l(l') is the number of samples, and y i (y i ) indicates the label value of x i (x i ). w(w') represents a combination of feature vectors, which can be calculated by Eq. 12: Where, α and n is Lagrange multipliers and the size of sample sets.

Final SVM classifier training.
Sometimes, it is hard to get correct results when testing some negative samples by the initial classifier. Thus, these examples are generally called hard examples. And the HOG features of hard examples will be extracted to combine with the initial HOG features as new negative samples. Repeating the step 3, the final two-stage SVM classifiers with appropriate detection accuracy will be trained.

2.3.3.
Fusion results. The final purpose of truck detection is to find the position of the target truck in the whole image. Thus, all of the candidate windows need to be scanned by a slide window in a multi-scale way. During this processing, the same truck will be detected several times at different scale and position. Fusion results In n-dimensional space R n , there are n sample points x i , mean-shift vector M h (x) can be calculated by Eq. 13.
Where, the S h is a high dimensional ball area with radius h, which can be defined by Eq. 14.
The green rectangular frames in Figure 14 is the discrete results of the same truck position. After averaging these positions, the coordinate of the target truck will be calculated as an accurate truck position.
In conclusion, the fast pre-search method operates with fewer of the candidate windows and reduce the search region to reduce the detection times. On the other hand, the two-stage SVM classifiers are trained for two different truck shapes with a higher accuracy. Combination of fast pre-search method and the two-stage SVM classifiers will meet the demand of real-time and detection accuracy. 3. Results and discussion. The fast truck recognition approach has been integrated into a system, which has been tested in Taicang Port. The screenshot of experimental results are as shown in Figure 15.
We evaluate our detection approach on the truck dataset collected from Taicang Port. This dataset includes 500 pictures of two types positive samples and 3000 negative samples in training sets, 500 daytime pictures and 500 nighttime pictures (540*960 pixels) in testing set, which contains 981 container trucks. All of these sample pictures are from the complicated background during the handling process in container terminal.   Figure 16. Comparison of two algorithms about average processing time worse than the day. In fact, the overall detection rate of our fast detection approach is 98% in Table 1, which can satisfy the handling request, while the traditional approach reached only 94%. It means that the training way of the two-stage SVM classifier separately can achieve a better classifier, and the traditional way with the mixed types samples is difficult to obtain a high-performance universal classifier. As shown in Figure 16, the traditional algorithm always has more average processing time (APT) than our fast detection algorithm: the traditional one takes about 400ms per image while the fast detection one need only 152ms. Because the fast truck pre-search algorithm can reduce a lot scanning area for the two-stage SVM classifier and the traditional algorithm always needs to scan the whole image. However, the variation of light at night brings more search area. As a result, the truck detection will take more time at night than the day.
In conclusion, this paper proposed a novel container truck detection approach which is made up of a fast truck pre-search algorithm and a two-stage SVM classifier for different types of truck head. It can improve the detection accuracy to 98%, which has a great progress compared to traditional SVM classifier and can also satisfy the real-time request.

4.
Conclusions. According to the existing problems of traditional vision algorithm, a novel fast truck recognition method base on fast regional clustering algorithm and the two-stage classifier is proposed in this paper. Firstly, the truck region will be pre-searched quickly by regional clustering as the candidate sub-windows. Then, the truck position can be recognized by the two-stage SVM accurately in the corresponding sub-windows. The experimental tests in container terminal of Taicang Port show that not only the detection accuracy of the fast truck recognition approach can reach 98%, but also costs less time than the traditional SVM and satisfy the real-time request. In conclusion, the improved classification approach proposed in this paper has higher efficiency and better truck recognition performance.