# American Institute of Mathematical Sciences

February  2011, 5(1): 115-136. doi: 10.3934/ipi.2011.5.115

## Is SIFT scale invariant?

 1 CMLA, ENS Cachan, 61 avenue du Président Wilson, 94235 Cachan Cedex, France 2 CMAP, Ecole Polytechnique, 91128 Palaiseau Cedex, France

Received  October 2010 Revised  November 2010 Published  February 2011

This note is devoted to a mathematical exploration of whether Lowe's Scale-Invariant Feature Transform (SIFT)[21], a very successful image matching method, is similarity invariant as claimed. It is proved that the method is scale invariant only if the initial image blurs are exactly guessed. Yet, even a large error on the initial blur is quickly attenuated by this multiscale method, when the scale of analysis increases. In consequence, its scale invariance is almost perfect. The mathematical arguments are given under the assumption that the Gaussian smoothing performed by SIFT gives an aliasing free sampling of the image evolution. The validity of this main assumption is confirmed by a rigorous experimental procedure, and by a mathematical proof. These results explain why SIFT outperforms all other image feature extraction methods when it comes to scale invariance.
Citation: Jean-Michel Morel, Guoshen Yu. Is SIFT scale invariant?. Inverse Problems and Imaging, 2011, 5 (1) : 115-136. doi: 10.3934/ipi.2011.5.115
##### References:
 [1] A. Agarwala, M. Agrawala, M. Cohen, D. Salesin and R. Szeliski, Photographing long scenes with multi-viewpoint panoramas, International Conference on Computer Graphics and Interactive Techniques, (2006), 853-861. [2] A. Baumberg, Reliable feature matching across widely separated views, Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 1 (2000), 774-781. [3] M. Bennewitz, C. Stachniss, W. Burgard and S. Behnke, Metric localization with scale-Invariant visual features using a single perspective camera, European Robotics Symposium, (2006), 195. doi: 10.1007/11681120_16. [4] M. Brown and D. Lowe, Recognising panorama, in Proc. the 9th Int. Conf. Computer Vision, October, (2003), 1218-1225. [5] E. Y. Chang, EXTENT: Fusing context, content, and semantic ontology for photo annotation, Proceedings of the 2nd International Workshop on Computer Vision Meets Databases, (2005), 5-11. doi: 10.1145/1160939.1160945. [6] Q. Fan, K. Barnard, A. Amir, A. Efrat and M. Lin, Matching slides to presentation videos using SIFT and scene background matching, Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, (2006), 239-248. [7] L. Février, "A Wide-baseline Matching Library for Zeno," Internship report, ENS, Paris, France, 2007, www.di.ens.fr/~fevrier/papers/2007-InternsipReportILM.pdf. [8] J. J. Foo and R. Sinha, Pruning SIFT for scalable near-duplicate image matching, Proceedings of the Eighteenth Conference on Australasian Database, 63 (2007), 63-71. [9] G. Fritz, C. Seifert, M. Kumar and L. Paletta, Building detection from mobile imagery using informative SIFT descriptors, Lecture Notes in Computer Science, (2005), 629-638. doi: 10.1007/11499145_64. [10] C. Gasquet and P. Witomski, "Fourier Analysis and Applications: Filtering, Numerical Computation, Wavelets," Springer Verlag, 1999. [11] I. Gordon and D. G. Lowe, What and where: 3D object recognition with accurate pose, Lecture Notes in Computer Science, 4170 (2006), 67. doi: 10.1007/11957959_4. [12] J. S. Hare and P. H. Lewis, Salient regions for query by image content, Image and Video Retrieval: Third International Conference, CIVR, (2004), 317-325. [13] C. Harris and M. Stephens, A combined corner and edge detector, Alvey Vision Conference, 15 (1988), 50. [14] T. Kadir, A. Zisserman and M. Brady, An affine invariant salient region detector, in European Conference on Computer Vision, (2004), 228-241. [15] Y. Ke and R. Sukthankar, PCA-SIFT: A more distinctive representation for local image descriptors, Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 2 (2004), 506-513. [16] J. Kim, S. M. Seitz and M. Agrawala, Video-based document tracking: Unifying your physical and electronic desktops, Proc. of the 17th Annual ACM Symposium on User interface Software and Technology, 24 (2004), 99-107. [17] B. N. Lee, W. Y. Chen and E. Y. Chang, Fotofiti: Web service for photo management, Proceedings of the 14th Annual ACM International Conference on Multimedia, (2006), 485-486. doi: 10.1145/1180639.1180737. [18] H. Lejsek, F. H. Ásmundsson, B. T. Jónsson and L. Amsaleg, Scalability of local image descriptors: A comparative study, Proceedings of the 14th Annual ACM International Conference on Multimedia, (2006), 589-598. doi: 10.1145/1180639.1180760. [19] T. Lindeberg, Scale-space theory: A basic tool for analyzing structures at different scales, Journal of Applied Statistics, 21 (1994), 225-270. doi: 10.1080/757582976. [20] T. Lindeberg and J. Garding, Shape-adapted smoothing in estimation of 3-D depth cues from affine distortions of local 2-D brightness structure, Proc. ECCV, (1994), 389-400. [21] D. G. Lowe, Distinctive image features from scale-invariant key points, International Journal of Computer Vision, 60 (2004), 91-110. doi: 10.1023/B:VISI.0000029664.99615.94. [22] J. Matas, O. Chum, M. Urban and T. Pajdla, Robust wide-baseline stereo from maximally stable extremal regions, Image and Vision Computing, 22 (2004), 761-767. doi: 10.1016/j.imavis.2004.02.006. [23] K. Mikolajczyk and C. Schmid, Indexing based on scale invariant interest points, Proc. ICCV, 1 (2001), 525-531. [24] K. Mikolajczyk and C. Schmid, An affine invariant interest point detector, Proc. ECCV, 1 (2002), 128-142. [25] K. Mikolajczyk and C. Schmid, A performance evaluation of local descriptors, in "International Conference on Computer Vision and Pattern Recognition," volume 2, (2003), 257-263. [26] K. Mikolajczyk and C. Schmid, Scale and affine invariant interest point detectors, International Journal of Computer Vision, 60 (2004), 63-86. doi: 10.1023/B:VISI.0000027790.02288.f2. [27] K. Mikolajczyk and C. Schmid, A performance evaluation of local descriptors, IEEE Trans. PAMI, (2005), 1615-1630. [28] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir and L. V. Gool, A comparison of affine region detectors, International Journal of Computer Vision, 65 (2005), 43-72. doi: 10.1007/s11263-005-3848-x. [29] P. Monasse, Contrast invariant image registration, Proc. of the International Conf. on Acoustics, Speech and Signal Processing, Phoenix, Arizona, 6 (1999), 3221-3224. [30] P. Moreels and P. Perona, Common-frame model for object recognition, Neural Information Processing Systems, (2004), 953-960. [31] J. M. Morel and G. Yu, ASIFT: A new framework for fully affine invariant image comparison, SIAM Journal on Imaging Sciences, 2 (2009), 438-469. doi: 10.1137/080732730. [32] A. Murarka, J. Modayil and B. Kuipers, Building local safety maps for a wheelchair robot using vision and lasers, in "Proceedings of the The 3rd Canadian Conference on Computer and Robot Vision," IEEE Computer Society Washington, DC, USA, 2006. [33] P. Musé, F. Sur, F. Cao and Y. Gousseau, Unsupervised thresholds for shape matching, Proc. of the International Conference on Image Processing, 2 (2003), 647-650. [34] P. Musé, F. Sur, F. Cao, Y. Gousseau and J. M. Morel, An a contrario decision method for shape element recognition, International Journal of Computer Vision, 69 (2006), 295-315. doi: 10.1007/s11263-006-7546-0. [35] A. Negre, H. Tran, N. Gourier, D. Hall, A. Lux and J. L. Crowley, Comparative study of people detection in surveillance scenes, Structural, Syntactic and Statistical Pattern Recognition, Proceedings Lecture Notes in Computer Science, 4109 (2006), 100-108. doi: 10.1007/11815921_10. [36] D. Nister and H. Stewenius, Scalable recognition with a vocabulary tree, Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, (2006), 2161-2168. [37] J. Rabin, Y. Gousseau and J. Delon, A statistical approach to the matching of local features, SIAM Journal on Imaging Sciences, 2 (2009), 931-958. doi: 10.1137/090751359. [38] F. Riggi, M. Toews and T. Arbel, Fundamental matrix estimation via TIP-transfer of invariant parameters, Proceedings of the 18th International Conference on Pattern Recognition (ICPR'06), 2 (2006), 21-24. [39] J. Ruiz-del Solar, P. Loncomilla and C. Devia, A new approach for fingerprint verification based on wide baseline matching using local interest points and descriptors, Lecture Notes in Computer Science, 4872 (2007), 586-599. doi: 10.1007/978-3-540-77129-6_51. [40] P. Scovanner, S. Ali and M. Shah, A 3-dimensional SIFT descriptor and its application to action recognition, Proceedings of the 15th International Conference on Multimedia, (2007), 357-360. doi: 10.1145/1291233.1291311. [41] C. E. Shannon, A mathematical theory of communication, The Bell System Technical Journal, 27 (1948), 623-656. [42] T. Tuytelaars and L. Van Gool, Matching widely separated views based on affine invariant regions, International Journal of Computer Vision, 59 (2004), 61-85. doi: 10.1023/B:VISI.0000020671.28016.e8. [43] L. Vacchetti, V. Lepetit and P. Fua, Stable real-time 3D tracking using online and offline information, IEEE Trans PAMI, (2004), 1385-1391. [44] M. Veloso, F. von Hundelshausen and P. E. Rybski, Learning visual object definitions by observing human activities, in "Proc. of the IEEE-RAS Int. Conf. on Humanoid Robots," (2005), 148-153. doi: 10.1109/ICHR.2005.1573560. [45] M. Vergauwen and L. Van Gool, Web-based 3D reconstruction service, Machine Vision and Applications, 17 (2005), 411-426. doi: 10.1007/s00138-006-0027-1. [46] K. Yanai, Image collector III: a web image-gathering system with bag-of-keypoints, Proc. of the 16th Int. Conf. on World Wide Web, (2007), 1295-1296. doi: 10.1145/1242572.1242816.

show all references

##### References:
 [1] A. Agarwala, M. Agrawala, M. Cohen, D. Salesin and R. Szeliski, Photographing long scenes with multi-viewpoint panoramas, International Conference on Computer Graphics and Interactive Techniques, (2006), 853-861. [2] A. Baumberg, Reliable feature matching across widely separated views, Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 1 (2000), 774-781. [3] M. Bennewitz, C. Stachniss, W. Burgard and S. Behnke, Metric localization with scale-Invariant visual features using a single perspective camera, European Robotics Symposium, (2006), 195. doi: 10.1007/11681120_16. [4] M. Brown and D. Lowe, Recognising panorama, in Proc. the 9th Int. Conf. Computer Vision, October, (2003), 1218-1225. [5] E. Y. Chang, EXTENT: Fusing context, content, and semantic ontology for photo annotation, Proceedings of the 2nd International Workshop on Computer Vision Meets Databases, (2005), 5-11. doi: 10.1145/1160939.1160945. [6] Q. Fan, K. Barnard, A. Amir, A. Efrat and M. Lin, Matching slides to presentation videos using SIFT and scene background matching, Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, (2006), 239-248. [7] L. Février, "A Wide-baseline Matching Library for Zeno," Internship report, ENS, Paris, France, 2007, www.di.ens.fr/~fevrier/papers/2007-InternsipReportILM.pdf. [8] J. J. Foo and R. Sinha, Pruning SIFT for scalable near-duplicate image matching, Proceedings of the Eighteenth Conference on Australasian Database, 63 (2007), 63-71. [9] G. Fritz, C. Seifert, M. Kumar and L. Paletta, Building detection from mobile imagery using informative SIFT descriptors, Lecture Notes in Computer Science, (2005), 629-638. doi: 10.1007/11499145_64. [10] C. Gasquet and P. Witomski, "Fourier Analysis and Applications: Filtering, Numerical Computation, Wavelets," Springer Verlag, 1999. [11] I. Gordon and D. G. Lowe, What and where: 3D object recognition with accurate pose, Lecture Notes in Computer Science, 4170 (2006), 67. doi: 10.1007/11957959_4. [12] J. S. Hare and P. H. Lewis, Salient regions for query by image content, Image and Video Retrieval: Third International Conference, CIVR, (2004), 317-325. [13] C. Harris and M. Stephens, A combined corner and edge detector, Alvey Vision Conference, 15 (1988), 50. [14] T. Kadir, A. Zisserman and M. Brady, An affine invariant salient region detector, in European Conference on Computer Vision, (2004), 228-241. [15] Y. Ke and R. Sukthankar, PCA-SIFT: A more distinctive representation for local image descriptors, Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 2 (2004), 506-513. [16] J. Kim, S. M. Seitz and M. Agrawala, Video-based document tracking: Unifying your physical and electronic desktops, Proc. of the 17th Annual ACM Symposium on User interface Software and Technology, 24 (2004), 99-107. [17] B. N. Lee, W. Y. Chen and E. Y. Chang, Fotofiti: Web service for photo management, Proceedings of the 14th Annual ACM International Conference on Multimedia, (2006), 485-486. doi: 10.1145/1180639.1180737. [18] H. Lejsek, F. H. Ásmundsson, B. T. Jónsson and L. Amsaleg, Scalability of local image descriptors: A comparative study, Proceedings of the 14th Annual ACM International Conference on Multimedia, (2006), 589-598. doi: 10.1145/1180639.1180760. [19] T. Lindeberg, Scale-space theory: A basic tool for analyzing structures at different scales, Journal of Applied Statistics, 21 (1994), 225-270. doi: 10.1080/757582976. [20] T. Lindeberg and J. Garding, Shape-adapted smoothing in estimation of 3-D depth cues from affine distortions of local 2-D brightness structure, Proc. ECCV, (1994), 389-400. [21] D. G. Lowe, Distinctive image features from scale-invariant key points, International Journal of Computer Vision, 60 (2004), 91-110. doi: 10.1023/B:VISI.0000029664.99615.94. [22] J. Matas, O. Chum, M. Urban and T. Pajdla, Robust wide-baseline stereo from maximally stable extremal regions, Image and Vision Computing, 22 (2004), 761-767. doi: 10.1016/j.imavis.2004.02.006. [23] K. Mikolajczyk and C. Schmid, Indexing based on scale invariant interest points, Proc. ICCV, 1 (2001), 525-531. [24] K. Mikolajczyk and C. Schmid, An affine invariant interest point detector, Proc. ECCV, 1 (2002), 128-142. [25] K. Mikolajczyk and C. Schmid, A performance evaluation of local descriptors, in "International Conference on Computer Vision and Pattern Recognition," volume 2, (2003), 257-263. [26] K. Mikolajczyk and C. Schmid, Scale and affine invariant interest point detectors, International Journal of Computer Vision, 60 (2004), 63-86. doi: 10.1023/B:VISI.0000027790.02288.f2. [27] K. Mikolajczyk and C. Schmid, A performance evaluation of local descriptors, IEEE Trans. PAMI, (2005), 1615-1630. [28] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir and L. V. Gool, A comparison of affine region detectors, International Journal of Computer Vision, 65 (2005), 43-72. doi: 10.1007/s11263-005-3848-x. [29] P. Monasse, Contrast invariant image registration, Proc. of the International Conf. on Acoustics, Speech and Signal Processing, Phoenix, Arizona, 6 (1999), 3221-3224. [30] P. Moreels and P. Perona, Common-frame model for object recognition, Neural Information Processing Systems, (2004), 953-960. [31] J. M. Morel and G. Yu, ASIFT: A new framework for fully affine invariant image comparison, SIAM Journal on Imaging Sciences, 2 (2009), 438-469. doi: 10.1137/080732730. [32] A. Murarka, J. Modayil and B. Kuipers, Building local safety maps for a wheelchair robot using vision and lasers, in "Proceedings of the The 3rd Canadian Conference on Computer and Robot Vision," IEEE Computer Society Washington, DC, USA, 2006. [33] P. Musé, F. Sur, F. Cao and Y. Gousseau, Unsupervised thresholds for shape matching, Proc. of the International Conference on Image Processing, 2 (2003), 647-650. [34] P. Musé, F. Sur, F. Cao, Y. Gousseau and J. M. Morel, An a contrario decision method for shape element recognition, International Journal of Computer Vision, 69 (2006), 295-315. doi: 10.1007/s11263-006-7546-0. [35] A. Negre, H. Tran, N. Gourier, D. Hall, A. Lux and J. L. Crowley, Comparative study of people detection in surveillance scenes, Structural, Syntactic and Statistical Pattern Recognition, Proceedings Lecture Notes in Computer Science, 4109 (2006), 100-108. doi: 10.1007/11815921_10. [36] D. Nister and H. Stewenius, Scalable recognition with a vocabulary tree, Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, (2006), 2161-2168. [37] J. Rabin, Y. Gousseau and J. Delon, A statistical approach to the matching of local features, SIAM Journal on Imaging Sciences, 2 (2009), 931-958. doi: 10.1137/090751359. [38] F. Riggi, M. Toews and T. Arbel, Fundamental matrix estimation via TIP-transfer of invariant parameters, Proceedings of the 18th International Conference on Pattern Recognition (ICPR'06), 2 (2006), 21-24. [39] J. Ruiz-del Solar, P. Loncomilla and C. Devia, A new approach for fingerprint verification based on wide baseline matching using local interest points and descriptors, Lecture Notes in Computer Science, 4872 (2007), 586-599. doi: 10.1007/978-3-540-77129-6_51. [40] P. Scovanner, S. Ali and M. Shah, A 3-dimensional SIFT descriptor and its application to action recognition, Proceedings of the 15th International Conference on Multimedia, (2007), 357-360. doi: 10.1145/1291233.1291311. [41] C. E. Shannon, A mathematical theory of communication, The Bell System Technical Journal, 27 (1948), 623-656. [42] T. Tuytelaars and L. Van Gool, Matching widely separated views based on affine invariant regions, International Journal of Computer Vision, 59 (2004), 61-85. doi: 10.1023/B:VISI.0000020671.28016.e8. [43] L. Vacchetti, V. Lepetit and P. Fua, Stable real-time 3D tracking using online and offline information, IEEE Trans PAMI, (2004), 1385-1391. [44] M. Veloso, F. von Hundelshausen and P. E. Rybski, Learning visual object definitions by observing human activities, in "Proc. of the IEEE-RAS Int. Conf. on Humanoid Robots," (2005), 148-153. doi: 10.1109/ICHR.2005.1573560. [45] M. Vergauwen and L. Van Gool, Web-based 3D reconstruction service, Machine Vision and Applications, 17 (2005), 411-426. doi: 10.1007/s00138-006-0027-1. [46] K. Yanai, Image collector III: a web image-gathering system with bag-of-keypoints, Proc. of the 16th Int. Conf. on World Wide Web, (2007), 1295-1296. doi: 10.1145/1242572.1242816.
 [1] Sheng Zhang, Xiu Yang, Samy Tindel, Guang Lin. Augmented Gaussian random field: Theory and computation. Discrete and Continuous Dynamical Systems - S, 2022, 15 (4) : 931-957. doi: 10.3934/dcdss.2021098 [2] Navin Keswani. Homotopy invariance of relative eta-invariants and $C^*$-algebra $K$-theory. Electronic Research Announcements, 1998, 4: 18-26. [3] Takayoshi Ogawa, Kento Seraku. Logarithmic Sobolev and Shannon's inequalities and an application to the uncertainty principle. Communications on Pure and Applied Analysis, 2018, 17 (4) : 1651-1669. doi: 10.3934/cpaa.2018079 [4] Faming Fang, Fang Li, Tieyong Zeng. Reducing spatially varying out-of-focus blur from natural image. Inverse Problems and Imaging, 2017, 11 (1) : 65-85. doi: 10.3934/ipi.2017004 [5] Jean Dolbeault, Maria J. Esteban, Michał Kowalczyk, Michael Loss. Improved interpolation inequalities on the sphere. Discrete and Continuous Dynamical Systems - S, 2014, 7 (4) : 695-724. doi: 10.3934/dcdss.2014.7.695 [6] Charles Fefferman. Interpolation by linear programming I. Discrete and Continuous Dynamical Systems, 2011, 30 (2) : 477-492. doi: 10.3934/dcds.2011.30.477 [7] Le Li, Lihong Huang, Jianhong Wu. Flocking and invariance of velocity angles. Mathematical Biosciences & Engineering, 2016, 13 (2) : 369-380. doi: 10.3934/mbe.2015007 [8] Hitoshi Ishii, Paola Loreti, Maria Elisabetta Tessitore. A PDE approach to stochastic invariance. Discrete and Continuous Dynamical Systems, 2000, 6 (3) : 651-664. doi: 10.3934/dcds.2000.6.651 [9] Anh N. Le. Sublacunary sets and interpolation sets for nilsequences. Discrete and Continuous Dynamical Systems, 2022, 42 (4) : 1855-1871. doi: 10.3934/dcds.2021175 [10] Jean Dolbeault, An Zhang. Parabolic methods for ultraspherical interpolation inequalities. Discrete and Continuous Dynamical Systems, 2022  doi: 10.3934/dcds.2022080 [11] Keaton Hamm, Longxiu Huang. Stability of sampling for CUR decompositions. Foundations of Data Science, 2020, 2 (2) : 83-99. doi: 10.3934/fods.2020006 [12] Jacky Cresson, Bénédicte Puig, Stefanie Sonner. Stochastic models in biology and the invariance problem. Discrete and Continuous Dynamical Systems - B, 2016, 21 (7) : 2145-2168. doi: 10.3934/dcdsb.2016041 [13] Adriano Da Silva, Christoph Kawan. Invariance entropy of hyperbolic control sets. Discrete and Continuous Dynamical Systems, 2016, 36 (1) : 97-136. doi: 10.3934/dcds.2016.36.97 [14] Xing-Fu Zhong. Variational principles of invariance pressures on partitions. Discrete and Continuous Dynamical Systems, 2020, 40 (1) : 491-508. doi: 10.3934/dcds.2020019 [15] Zvi Artstein. Invariance principle in the singular perturbations limit. Discrete and Continuous Dynamical Systems - B, 2019, 24 (8) : 3653-3666. doi: 10.3934/dcdsb.2018309 [16] Christoph Kawan. Upper and lower estimates for invariance entropy. Discrete and Continuous Dynamical Systems, 2011, 30 (1) : 169-186. doi: 10.3934/dcds.2011.30.169 [17] Robert Jarrow, Philip Protter, Jaime San Martin. Asset price bubbles: Invariance theorems. Frontiers of Mathematical Finance, 2022, 1 (2) : 161-188. doi: 10.3934/fmf.2021006 [18] Alexandre J. Chorin, Fei Lu, Robert N. Miller, Matthias Morzfeld, Xuemin Tu. Sampling, feasibility, and priors in data assimilation. Discrete and Continuous Dynamical Systems, 2016, 36 (8) : 4227-4246. doi: 10.3934/dcds.2016.36.4227 [19] Shixu Meng. A sampling type method in an electromagnetic waveguide. Inverse Problems and Imaging, 2021, 15 (4) : 745-762. doi: 10.3934/ipi.2021012 [20] Yvon Maday, Ngoc Cuong Nguyen, Anthony T. Patera, S. H. Pau. A general multipurpose interpolation procedure: the magic points. Communications on Pure and Applied Analysis, 2009, 8 (1) : 383-404. doi: 10.3934/cpaa.2009.8.383

2021 Impact Factor: 1.483