A FAST MATCHING ALGORITHM FOR THE IMAGES WITH LARGE SCALE DISPARITY

. With the expansion of application areas of unmanned aerial vehi-cle (UAV) applications, there is a rising demand to realize UAV navigation by means of computer vision. Speeded-Up Robust Features (SURF) is an ideal image matching algorithm to be applied to solve the location for UAV. How- ever, if there is a large scale diﬀerence between two images with the same scene taken by UAV and satellite respectively, it is diﬃcult to apply SURF to complete the accurate image matching directly. In this paper, a fast image matching algorithm which can bridge the huge scale gap is proposed. The fast matching algorithm searches an optimal scaling ratio based on the ground distance represented by pixel. Meanwhile, a validity index for validating the performance of matching is given. The experimental results illustrate that the proposed algorithm performs better performance both on speed and accuracy. What’s more, the proposed algorithm can also obtain the correct matching results on the images with rotation. Therefore, the proposed algorithm could be applied to location and navigation for UAV in future.

1. Introduction. Object recognition and matching is a basic problem in the field of computer vision [3]. Given two images with the same scenery but photographed under different circumstances, such as illumination changes and camera variances, the matching needs to find out the correspondences of the same objects or scenes separately existing in the two images. This procedure is also applied in image retrieval, camera calibration and image registration etc.
Image matching is generally divided into three steps [12]. The first step is detection: some regions or structures with special characteristic are selected out such as edges, corners and blobs. Generally, these detecting algorithms, for instance, Hessian-Laplace and Harris-Laplace [6], etc. are called detectors. The next step is description: represent the selected parts with theirs surrounding pixels information. There are also some description algorithms such as LIOP [5] and Brief [13], which are normally called descriptors. The final step is matching. There are many matching algorithms and the matching criteria may vary according to different types of descriptors and different purposes. A classic and widely accepted matching method is Mahalanobis or Euclidean distance [11].
In order to realize object recognition and matching, some classic algorithms are invented. D. Lowe used to propose the Scale Invariant Feature Transform (SIFT), which is famous on its accuracy and robustness. However, SIFT is time-consuming, so it cannot be applied in reality directly. Recently S. Katta and S. Pabboju etc,. introduced PCA into SIFT which downgrades the dimension of SIFT descriptor in order to accelerate the matching [8,9]. However, the SIFT detector remains low efficiency. Speeded-Up Robust Feature (SURF) is originally proposed by H Bay and the different versions of SURF also appeared [14], the principle of SURF is similar to SIFT but the former one utilizes the filtering to replace the convolution. Therefore SURF acquires great computation efficiency with sacrificing a little accuracy. Thus SURF gets widely used in reality. In this paper, a fast image matching algorithm is proposed in order to solve the matching between UAV images and tiles of Google maps.
This study contributes to the existing literature in two aspects: an optimal scaling ratio for UAV image is given, in order to perform the best matching between UAV image and Google tiles; the other is that a fast matching algorithm is proposed, which can satisfy the rotation invariance of UAV.
This paper is organized as follows: in Section II, the basic ideas of SURF and matching algorithm will be reviewed. In Section III, the matching difficulties accompanied by larger scale gap will be re-examined and an alternative method which is proposed by Ao [2] will be briefly discussed. Then a faster matching algorithm method is proposed. In Section IV, in order to verify the performance of the new algorithm, experimental results are compared with that of an alternative method.
2. Related work. SURF (Speeded-Up Robust Features) is famous for its speed.
Meanwhile it claims to be scale invariance and rotation robust. Generally SURF is a combination of two parts: detector and descriptor. For the detector part, SURF utilizes Hessian determinant to find the place and scale of blob-like structures; for the descriptor part, SURF utilizes Haar wavelet to preserve the neighbour information. This kind feature makes SURF still suitable for significant changes of brightness [20] and it can be easily extended to fit for different demands [1].

2.1.
Detection of interest point. The speed of SURF is acquired by substituting convolution operation carried on normal image with filtering operation applied on integral image thus greatly reduces the computation time and only sacrificing a slice of accuracy [16].
2.1.1. Integral image and Hessian matrix. Let I(X) is a normal gray image then its integral image I (X) is defined as below. Given a location X(x, y), the pixel value at this point X on I(X) equals to the summation of all pixel value which are confined by a rectangle region with the original point on the left-up and this point X on I(X) and the formula is as follows 1. So for a given area , the area value can be easily calculated by three additions = A − B − C + D as shown in Figure  1. Hessian matrix is used to find blob-like structures and it can also be used to select the appropriate scale. Given a point X(x, y) on a grey image I, the Hessian matrix formula with σ scale at X is as follow 2: where L xx (X, σ), is the convolution of Gaussian second order derivative ∂ 2 ∂x 2 g(σ) with the image I at point X. L xy (X, σ) and L yy (X, σ) have the similar meanings.
2.1.2. Box filters and approximate determinant of Hessian matrix. Box filters can be applied on integral image and it is the approximation of discretised and cropped the second order of Gaussian partial and mix-partial derivatives, thereby the convolution calculation could be reduced as filtering the integral image with box filters displayed in Figure 2. We denote D xx , D xy and D yy as the filtering results which are separately corresponding to L xx (X, σ), L xy (X, σ) and L yy (X, σ) then the Hessian matrix 2 could be simplified to formula 3. The approximate determinant is acquired by 4. The simplified Hessian matrix is The initial size of box filters is 9 × 9 and the corresponding scale is σ = 1.2. In order to obey energy conservation and balance the approximation error, there is a weight w in formula 4 deriving from formula 5. Furthermore, the filter responses need to be normalised to balance different sizes of box filters.
After above computations on integral image I (X), at any point X(x, y) and at a given scale σ we can figure out the approximation determination of point X(x, y). By increasing the template size of box filters which is equivalent to increasing the scale σ as shown in Figure 3, for the same integral image I (X) we can get different Hessian determination at point X(x, y) on a higher scale. Figure 3. Filters D yy (above) and D xy (below) with two size: 9 × 9 templates (left) and 15 × 15 templates (right) 2.1.3. Scale pyramid and interpolation. According to filter template size, normally SURF groups these filters into three octaves and each octave contains four layers. The permutation is shown in Figure 4. The gradually increased filters are an analogue to increasing scale and downgrading the image as shown in Figure 5 [10]. The first octave is composed with 9 × 9, 15 × 15, 21 × 21 and 27 × 27 filters and the corresponding scale σ is 1.2, 2.0, 2.8 and 3.6. The second octave starts from 15 × 15 and the third starts from 27 × 27 which guarantees the continuity of scale change and accuracy of SURF.
The determinant achieves its extreme -maximum or minimum -when the filter's scale is comparable to the object's scale, and the centre of filter is near to the interest point. As Figure 6 shown, we need to search for the extreme points in a 3 × 3 × 3 neighbourhood in the scale pyramid then use non-maximum suppression [7] to select out these points whose determinants are extreme among its surrounding points as interest points then we use interpolation to get the accurate location and scale space information [17].      3. Image matching algorithm based on SURF. SURF is a novel scale and rotation invariant detector and descriptor for images. It doesn't contain the matching algorithm. Different matching algorithms can be combined with different detection and description algorithms under specific demands. For example, matching based on similarity: we can calculate the distance among description vector such as SSD (Sum of squared differences, formula 6) which provides more precise matching result and SAD (Sum of absolute differences, formula 7) which principle is like SSD, but provides faster matching result.
For two equations: S(i, j) and T (i, j) represent interest points separately located on two images; M and N are the components of descriptor vector. Matching based on speed: if the size of matching database is huge, you may prefer "Approximate" mode rather than "Exhaustive" mode to do matching [4]. In this paper the SAD is preferred and considering the amount of interest points is small the exhaustive mode is selected [2].
3. Fast matching algorithm for images with large scale disparity. Commonly, the flying height of UAV is a few hundred meters, but the track height of the satellite is much higher from a few kilometres to thousands of kilometres. This makes huge difference between the UAV aerial images and the satellite maps. Therefore, without any image pre-processing, it is not realistic to match UAV aerial images and the satellite maps by using SURF [15,18]. Ao used to propose a method which traverses a range of ratios to search the optimal scaling ratio [2]. In his algorithm, an initial value scale α 0 is given which starts with the division of UAV flying height 500 meters to satellite vision height 9600 meters, i.e., α 0 = H U AV H satellite = 500 9600 ≈ 0.05. And the traversal range of α n is from α 0 to 0.2 with the step 0.01. In the process of gradually zooming out UAV image, it is equivalent to technically elevating the height of the UAV up to the height of satellite, to find the most suitable scaling ratio. Ao's method is time consuming, which is the main disadvantage of that algorithm. In this paper, a fast matching algorithm for the images with large scale disparity is proposed. In the following, the detail algorithm will be given. Firstly, due to the large different of scales between UAV image and Google tiles, UAV image with large scale should be reduced in order to match with Google tiles [19]. Thus, firstly a suitable scaling ratio is defined as follows 8 to reduce the scales of UAV images, where D U AV is the ground distance represented by each pixel on UAV image, and D T ile is the ground distance represented by each pixel on tile or satellite map. C is a coefficient used for improving the quality of tile and increasing the number of interest points detected. In this paper we double the size of tile, so we let C be 2. If the flight altitude of UAV is relatively stable, α best can be seen as a constant within a flying area. Now, the new UAV image I scaled can be obtained by reducing the dimensionality of I U AV with α best . Secondly, I scaled and each tile in satellite map will be matching by SURF algorithm. Meanwhile, in order to validate the results between the UAV image and tiles, a validity index V i will be defined as follows 9, where i refers to the serial number of tiles, N i refers to the number of matched pairs and AD i is the total average matching distances for matched pairs on each tile. Table 1 and Figure 11 show the details of this proposed algorithm. Notably, the proposed fast matching algorithm is suitable for two sceneries. One is the scenery in which UAV image is parallel to the corresponding ones of the Google satellite map or tile, and another scenery is UAV image are rotated a certain angle compared to the corresponding Google map or tile.   Google.com/maps/documentation/javascript/maptypes). Each pixel in UAV image captured at 500 meters represents about 0.21 meters. And the pixel in 15level Google map represents about 3.51 meters. Here let C be 2, so the best scaling ratio α best should be (0.21/3.51) × 2 = 0.1196. In order to verify the effectiveness of the fast matching method, we design the following two experiments. Experiment 1: The scenery is that UAV image is parallel to the corresponding Google map or tile. This condition is also verified by Ao's method [2]: it is expected to see that the correct tile could be selected, i.e., the correct scenery is located. Meanwhile, the computation time could be reduced compared with Ao's algorithm. Experiment 2: The scenery is that UAV images are rotated with a certain angle corresponding to the Google map or tile. In order to validate the efficacy of the proposed method, the UAV image will be rotated with different angles. And this is the common condition because the UAVs are always affected by airflow. Their angle or direction cannot be strictly parallel to the ground. While this condition has not been considered in Ao's paper.   Table 2, the average matching time for each UAV image is 23.4 seconds by the proposed fast matching method compared with the average 722.3 seconds consumed by Ao's method. The proposed method is approximate thirty times faster compared with Ao's method, so this is one improvement of this paper. Moreover, the overall numbers of the matched pairs via the proposed fast method is apparently superior, especially on image A and image B. As shown in Figure 12 and Figure 13, the numbers of the matched pairs are doubled. And these matched pairs are more   centralized. Therefore, the proposed fast algorithm not only maintains the accuracy of matching, but also accelerates the matching speed greatly. It is expected to be applied in the practical application.

4.2.
Results of experiment 2. In this experiment, the capacity of image matching on the images with the same scene and different directions will be identified. And the first work is to verify whether the correct matched tile will be selected by using the proposed algorithm when the given UAV image is not parallel to the tile. Another work is to estimate the direction of the given UAV image according to Google satellite map. Given two points A(a 1 , a 2 ) and B(b 1 , b 2 ) on an image, the straight line across A and B and x-axis of the image form an angle as shown in Figure 16. The angle will change along with the image rotation and it is easy to be calculated by the arctangent value as formula 10. Because the Google satellite map is of fix orientation: the x axis is parallel to the latitude and the y axis is parallel to the longitude, the map could be used as base orientation to estimate the direction of UAV. For example: if we have two matched pairs A(a 1 , a 2 ) and B(b 1 , b 2 ) on UAV aerial image, and the Angle is 60 degrees, in the meantime the Angle of the corresponding matched pairs A (a 1 , a 2 ), B (b 1 , b 2 ) on the best Google tile is 20 degrees, then the difference is the direction deviation of UAV, that is 40 degrees.
Apparently there are at least two couples of matched pairs for calculating the direction of UAV image. Furthermore, a more accurate direction could be computed by calculating the average of these directions with more than two couples of matched pairs. For example, we randomly select out two pairs out of total pairs and calculate one direction. Then the procedure is repeated for several times. Finally an averaged direction will be obtained. In the following experiment, three random groups of matched pairs are used to calculate the direction angle. The result is as shown in the third column of Table 3.
The second column is a serial of real rotation angles in degree. The last column is the corresponding calculated angle. It can be seen that the accuracy of calculated  rotation angle or direction is competitive close to the real image rotation angle: the average deviation in image B rotation group is 1.768 degree and the average deviation in image C rotation group is 0.993 degree. Then the matching results are obtained as shown in Figure 17 and Figure 18. Image B is rotated under a serial angles from 15 to 90 degrees in anti-clockwise direction with 15 degrees step. Figure 17 only displays the matching results of Image B with 15, 45 and 75 degrees rotations. In Figure 17, (a), (c) and (e) are the best matching tiles for (b), (d) and (f), respectively. Top group ((a) and (b)) displays the matching result of 15 rotation degrees, the middle group ((c) and (d)) is 45 rotation degrees, and the bottom group is 75 rotation degrees. Similarly, Image C is rotated from 110 to 160 degrees with 10 degrees step in anti-clockwise. Figure 18 shows the matching result of Image C with 120, 140 and 160 degrees rotations. It is clearly seen that the correct tile is selected and matched with any angle rotations. These matched groups illustrate that the fast matching method is invariant to rotation. It is expected to be applied in UAV autonomous location and navigation.  Conclusions. This paper proposes a fast matching algorithm, which can be applied on the images with huge scale difference. The proposed method first introduces a scaling factor to reduce the scale of the image, which can increase the matching accuracy. Furthermore, an angle evaluation index is presented, in order to validate the matching performance. The results show that the fast matching algorithmm not only obtains the matching accuracy and the invariance to rotation, but also accelerates the matching speed greatly. And it is expected to be applied in UAV autonomous navigation.