# American Institute of Mathematical Sciences

May  2020, 3(2): 125-140. doi: 10.3934/mfc.2020014

## Homography estimation along short videos by recurrent convolutional regression network

 University of South Carolina, Columbia, 29208, USA

* Corresponding author: Song Wang

Received  December 2019 Revised  March 2020 Published  May 2020

Many moving-camera video processing and analysis tasks require accurate estimation of homography across frames. Estimating homography between non-adjacent frames can be very challenging when their camera view angles show large difference. In this paper, we propose a new deep-learning based method for homography estimation along videos by exploiting temporal dynamics across frames. More specifically, we develop a recurrent convolutional regression network consisting of convolutional neural network (CNN) and recurrent neural network (RNN) with long short-term memory (LSTM) cells, followed by a regression layer for estimating the parameters of homography. In the experiments, we evaluate the proposed method on both the synthesized and real-world short videos. The experimental results verify that the proposed method can estimate the homographies along short videos better than several existing methods.

Citation: Yang Mi, Kang Zheng, Song Wang. Homography estimation along short videos by recurrent convolutional regression network. Mathematical Foundations of Computing, 2020, 3 (2) : 125-140. doi: 10.3934/mfc.2020014
##### References:
 [1] S. Baker, A. Datta and T. Kanade, Parameterizing homographies, in Tech. Report, CMU-RI-TR-06-11, Robotics Institute, Carnegie Mellon University, (2006). Google Scholar [2] S. Baker and I. Matthews, Lucas-Kanade 20 years on: A unifying framework, International Journal of Computer Vision, 56 (2004), 221-255.  doi: 10.1023/B:VISI.0000011205.11775.fd.  Google Scholar [3] D. Barath and Z. Kukelova, Homography from two orientation- and scale-covariant features, in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), (2019), 1091–1099. doi: 10.1109/ICCV.2019.00118.  Google Scholar [4] D. Barath and J. Matas, Graph-cut RANSAC, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, (2018), 6733–6741. doi: 10.1109/TIP.2017.2704431.  Google Scholar [5] H. Bay, T. Tuytelaars and L. Van Gool, SURF: Speeded up robust features, in Computer Vision – ECCV 2006, Lecture Notes in Computer Science, 3951, Springer, Berlin, Heidelberg, (2006), 404–417. doi: 10.1007/11744023_32.  Google Scholar [6] S. Benhimane and E. Malis, Real-time image-based tracking of planes using efficient second-order minimization, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566), 1 (2004), 943-948.  doi: 10.1109/IROS.2004.1389474.  Google Scholar [7] G. Bradski, The OpenCV Library, Dr. Dobb's Journal of Software Tools. Google Scholar [8] C. Chang, C. Chou and E. Y. Chang, CLKN: Cascaded Lucas-Kanade networks for image alignment, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, (2017), 3777–3785. doi: 10.1109/CVPR.2017.402.  Google Scholar [9] F. Chhaya, D. Reddy, S. Upadhyay, V. Chari, M. Z. Zia and K. M. Krishna, Monocular reconstruction of vehicles: Combining SLAM with shape priors, in 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, (2016), 5758–5765. doi: 10.1109/ICRA.2016.7487799.  Google Scholar [10] F. Chollet et al., Keras, https://keras.io, 2015. Google Scholar [11] N. Dalal and B. Triggs, Histograms of oriented gradients for human detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 1 (2005), 886-893.  doi: 10.1109/CVPR.2005.177.  Google Scholar [12] D. DeTone, T. Malisiewicz and A. Rabinovich, Deep image homography estimation, preprint, arXiv: 1606.03798. Google Scholar [13] M. A. Fischler and R. C. Bolles, Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography, Comm. ACM, 24 (1981), 381-395.  doi: 10.1145/358669.358692.  Google Scholar [14] C. Forster, M. Pizzoli and D. Scaramuzza, SVO: Fast semi-direct monocular visual odometry, in 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, (2014), 15–22. doi: 10.1109/ICRA.2014.6906584.  Google Scholar [15] E. Garcia-Fidalgo, A. Ortiz, F. Bonnin-Pascual and J. P. Company, A mosaicing approach for vessel visual inspection using a micro-aerial vehicle, in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, (2015), 104–110. doi: 10.1109/IROS.2015.7353361.  Google Scholar [16] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, 2$^nd$ edition, Cambridge University Press, Cambridge, 2003.   Google Scholar [17] K. He, X. Zhang, S. Ren and J. Sun, Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification, in 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, (2015), 1026–1034. doi: 10.1109/ICCV.2015.123.  Google Scholar [18] A. G. Howard, Some improvements on deep convolutional neural network based image classification, preprint, arXiv: 1312.5402. Google Scholar [19] Y.-F. Hsu, C.-C. Chou and M.-Y. Shih, Moving camera video stabilization using homography consistency, in 2012 19th IEEE International Conference on Image Processing, Orlando, FL, (2012), 2761–2764. doi: 10.1109/ICIP.2012.6467471.  Google Scholar [20] M.-D. Hua, T. Hamel, R. Mahony and G. Allibert, Explicit complementary observer design on special linear group SL(3) for homography estimation using conic correspondences, in 2017 IEEE 56th Annual Conference on Decision and Control (CDC), Melbourne, VIC, (2017), 2434–2441. doi: 10.1109/CDC.2017.8264006.  Google Scholar [21] S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, in Proceedings of the 32 nd International Conference on Machine Learning, Lille, France, (2015), 448–456. Google Scholar [22] W. Jiang and J. Gu, Video stitching with spatial-temporal content-preserving warping, in 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Boston, MA, (2015), 42–48. doi: 10.1109/CVPRW.2015.7301374.  Google Scholar [23] F. Jurie and M. Dhome, Hyperplane approximation for template matching, IEEE Transactions on Pattern Analysis and Machine Intelligence, 24 (2002), 996-1000.  doi: 10.1109/TPAMI.2002.1017625.  Google Scholar [24] A. Kendall, M. Grimes and R. Cipolla, PoseNet: A convolutional network for real-time 6-DOF camera relocalization, in 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, (2015), 2938–2946. doi: 10.1109/ICCV.2015.336.  Google Scholar [25] D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, preprint, arXiv: 1412.6980. Google Scholar [26] P. Liang, Y. Wu, H. Lu, L. Wang, C. Liao and H. Ling, Planar object tracking in the wild: A benchmark, in 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, (2018), 651–658. doi: 10.1109/ICRA.2018.8461037.  Google Scholar [27] T.-Y. Lin, et al., Microsoft COCO: Common objects in context, in Computer Vision – ECCV 2014, Lecture Notes in Computer Science, 8693, Springer, Cham, (2014), 740–755. doi: 10.1007/978-3-319-10602-1_48.  Google Scholar [28] S. Liu, L. Yuan, P. Tan and J. Sun, SteadyFlow: Spatially smooth optical flow for video stabilization, in 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, (2014), 4209–4216. doi: 10.1109/CVPR.2014.536.  Google Scholar [29] S. Liu, J. Chen, C.-H. Chang and Y. Ai, A new accurate and fast homography computation algorithm for sports and traffic video analysis, IEEE Transactions on Circuits and Systems for Video Technology, 28 (2018), 2993-3006.  doi: 10.1109/TCSVT.2017.2731781.  Google Scholar [30] D. G. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, 60 (2004), 91-110.  doi: 10.1023/B:VISI.0000029664.99615.94.  Google Scholar [31] B. D. Lucas and T. Kanade, An iterative image registration technique with an application to stereo vision, Proceedings of the 7th International Joint Conference on Artificial Intelligence, 2, Morgan Publishers Inc., San Francisco, CA, 1981,674–679. Google Scholar [32] C. Mei, S. Benhimane, E. Malis and P. Rives, Efficient homography-based tracking and 3-D reconstruction for single-viewpoint sensors, IEEE Transactions on Robotics, 24 (2008), 1352-1364.  doi: 10.1109/TRO.2008.2007941.  Google Scholar [33] Y. Mi, K. Zheng and S. Wang, Recognizing actions in wearable-camera videos by training classifiers on fixed-camera videos, in Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, ICMR '18, Association for Computing Machinery, New York, NY, (2018), 169–177. doi: 10.1145/3206025.3206041.  Google Scholar [34] Y. Mi and S. Wang, Recognizing micro actions in videos: learning motion details via segment-level temporal pyramid, in 2019 IEEE International Conference on Multimedia and Expo, Shanghai, China, (2019), 1036-1041. doi: 10.1109/ICME.2019.00182.  Google Scholar [35] Y. Mi, X. Zhang, Z. Li and S. Wang, Dual-branch network with a subtle motion detector for microaction recognition in videos, in IEEE Transactions on Image Processing, 29 (2020), 6194-6208. doi: 10.1109/TIP.2020.2989864.  Google Scholar [36] K. Mikolajczyk and et al., A comparison of affine region detectors, Int. J. Comput. Vision, 65 (2005), 43-72.  doi: 10.1007/s11263-005-3848-x.  Google Scholar [37] M. Muja and D. G. Lowe, Fast approximate nearest neighbors with automatic algorithm configuration, in VISAPP International Conference on Computer Vision Theory and Applications, (2009), 331–340. Google Scholar [38] V. Nair and G. E. Hinton, Rectified linear units improve restricted Boltzmann machines, in Proceedings of ICML, 27, Haifa, Isael, (2010), 807–814. Google Scholar [39] T. Nguyen, S. W. Chen, S. S. Shivakumar, C. J. Taylor and V. Kumar, Unsupervised deep homography: A fast and robust homography estimation model, IEEE Robotics and Automation Letters, 3 (2018), 2346-2353.  doi: 10.1109/LRA.2018.2809549.  Google Scholar [40] F. E. Nowruzi, R. Laganiere and N. Japkowicz, Homography estimation from image pairs with hierarchical convolutional networks, in 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, (2017), 904–911. doi: 10.1109/ICCVW.2017.111.  Google Scholar [41] M. Ozuysal, M. Calonder, V. Lepetit and P. Fua, Fast keypoint recognition using random ferns, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32 (2010), 448-461.  doi: 10.1109/TPAMI.2009.23.  Google Scholar [42] E. Rublee, V. Rabaud, K. Konolige and G. Bradski, ORB: An efficient alternative to SIFT or SURF, in 2011 International Conference on Computer Vision, Barcelona, (2011), 2564–2571. doi: 10.1109/ICCV.2011.6126544.  Google Scholar [43] J. Shi and Tomasi, Good features to track, in 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, (1994), 593–600. doi: 10.1109/CVPR.1994.323794.  Google Scholar [44] K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, preprint, arXiv: 1409.1556. Google Scholar [45] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever and R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfitting, J. Mach. Learn. Res., 15 (2014), 1929-1958.   Google Scholar [46] D. Tao, Y. Guo, B. Yu, J. Pang and Z. Yu, Deep multi-view feature learning for person re-identification, IEEE Transactions on Circuits and Systems for Video Technology, 28 (2018), 2657-2666.  doi: 10.1109/TCSVT.2017.2726580.  Google Scholar [47] D. Teney and M. Hebert, Learning to extract motion from videos in convolutional neural networks, in Computer Vision – ACCV 2016, Lecture Notes in Computer Science, 10115, Springer, Cham, (2016), 412–428. doi: 10.1007/978-3-319-54193-8_26.  Google Scholar [48] X. Yang, X. Si, T. Xue, L. Zhang and K.-T. T. Cheng, Vision-inertial hybrid tracking for robust and efficient augmented reality on smartphones, in Proceedings of the 23rd ACM International Conference on Multimedia, MM '15, Association for Computing Machinery, New York, NY, (2015), 1039–1042. doi: 10.1145/2733373.2806396.  Google Scholar

show all references

##### References:
 [1] S. Baker, A. Datta and T. Kanade, Parameterizing homographies, in Tech. Report, CMU-RI-TR-06-11, Robotics Institute, Carnegie Mellon University, (2006). Google Scholar [2] S. Baker and I. Matthews, Lucas-Kanade 20 years on: A unifying framework, International Journal of Computer Vision, 56 (2004), 221-255.  doi: 10.1023/B:VISI.0000011205.11775.fd.  Google Scholar [3] D. Barath and Z. Kukelova, Homography from two orientation- and scale-covariant features, in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), (2019), 1091–1099. doi: 10.1109/ICCV.2019.00118.  Google Scholar [4] D. Barath and J. Matas, Graph-cut RANSAC, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, (2018), 6733–6741. doi: 10.1109/TIP.2017.2704431.  Google Scholar [5] H. Bay, T. Tuytelaars and L. Van Gool, SURF: Speeded up robust features, in Computer Vision – ECCV 2006, Lecture Notes in Computer Science, 3951, Springer, Berlin, Heidelberg, (2006), 404–417. doi: 10.1007/11744023_32.  Google Scholar [6] S. Benhimane and E. Malis, Real-time image-based tracking of planes using efficient second-order minimization, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566), 1 (2004), 943-948.  doi: 10.1109/IROS.2004.1389474.  Google Scholar [7] G. Bradski, The OpenCV Library, Dr. Dobb's Journal of Software Tools. Google Scholar [8] C. Chang, C. Chou and E. Y. Chang, CLKN: Cascaded Lucas-Kanade networks for image alignment, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, (2017), 3777–3785. doi: 10.1109/CVPR.2017.402.  Google Scholar [9] F. Chhaya, D. Reddy, S. Upadhyay, V. Chari, M. Z. Zia and K. M. Krishna, Monocular reconstruction of vehicles: Combining SLAM with shape priors, in 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, (2016), 5758–5765. doi: 10.1109/ICRA.2016.7487799.  Google Scholar [10] F. Chollet et al., Keras, https://keras.io, 2015. Google Scholar [11] N. Dalal and B. Triggs, Histograms of oriented gradients for human detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 1 (2005), 886-893.  doi: 10.1109/CVPR.2005.177.  Google Scholar [12] D. DeTone, T. Malisiewicz and A. Rabinovich, Deep image homography estimation, preprint, arXiv: 1606.03798. Google Scholar [13] M. A. Fischler and R. C. Bolles, Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography, Comm. ACM, 24 (1981), 381-395.  doi: 10.1145/358669.358692.  Google Scholar [14] C. Forster, M. Pizzoli and D. Scaramuzza, SVO: Fast semi-direct monocular visual odometry, in 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, (2014), 15–22. doi: 10.1109/ICRA.2014.6906584.  Google Scholar [15] E. Garcia-Fidalgo, A. Ortiz, F. Bonnin-Pascual and J. P. Company, A mosaicing approach for vessel visual inspection using a micro-aerial vehicle, in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, (2015), 104–110. doi: 10.1109/IROS.2015.7353361.  Google Scholar [16] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, 2$^nd$ edition, Cambridge University Press, Cambridge, 2003.   Google Scholar [17] K. He, X. Zhang, S. Ren and J. Sun, Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification, in 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, (2015), 1026–1034. doi: 10.1109/ICCV.2015.123.  Google Scholar [18] A. G. Howard, Some improvements on deep convolutional neural network based image classification, preprint, arXiv: 1312.5402. Google Scholar [19] Y.-F. Hsu, C.-C. Chou and M.-Y. Shih, Moving camera video stabilization using homography consistency, in 2012 19th IEEE International Conference on Image Processing, Orlando, FL, (2012), 2761–2764. doi: 10.1109/ICIP.2012.6467471.  Google Scholar [20] M.-D. Hua, T. Hamel, R. Mahony and G. Allibert, Explicit complementary observer design on special linear group SL(3) for homography estimation using conic correspondences, in 2017 IEEE 56th Annual Conference on Decision and Control (CDC), Melbourne, VIC, (2017), 2434–2441. doi: 10.1109/CDC.2017.8264006.  Google Scholar [21] S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, in Proceedings of the 32 nd International Conference on Machine Learning, Lille, France, (2015), 448–456. Google Scholar [22] W. Jiang and J. Gu, Video stitching with spatial-temporal content-preserving warping, in 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Boston, MA, (2015), 42–48. doi: 10.1109/CVPRW.2015.7301374.  Google Scholar [23] F. Jurie and M. Dhome, Hyperplane approximation for template matching, IEEE Transactions on Pattern Analysis and Machine Intelligence, 24 (2002), 996-1000.  doi: 10.1109/TPAMI.2002.1017625.  Google Scholar [24] A. Kendall, M. Grimes and R. Cipolla, PoseNet: A convolutional network for real-time 6-DOF camera relocalization, in 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, (2015), 2938–2946. doi: 10.1109/ICCV.2015.336.  Google Scholar [25] D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, preprint, arXiv: 1412.6980. Google Scholar [26] P. Liang, Y. Wu, H. Lu, L. Wang, C. Liao and H. Ling, Planar object tracking in the wild: A benchmark, in 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, (2018), 651–658. doi: 10.1109/ICRA.2018.8461037.  Google Scholar [27] T.-Y. Lin, et al., Microsoft COCO: Common objects in context, in Computer Vision – ECCV 2014, Lecture Notes in Computer Science, 8693, Springer, Cham, (2014), 740–755. doi: 10.1007/978-3-319-10602-1_48.  Google Scholar [28] S. Liu, L. Yuan, P. Tan and J. Sun, SteadyFlow: Spatially smooth optical flow for video stabilization, in 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, (2014), 4209–4216. doi: 10.1109/CVPR.2014.536.  Google Scholar [29] S. Liu, J. Chen, C.-H. Chang and Y. Ai, A new accurate and fast homography computation algorithm for sports and traffic video analysis, IEEE Transactions on Circuits and Systems for Video Technology, 28 (2018), 2993-3006.  doi: 10.1109/TCSVT.2017.2731781.  Google Scholar [30] D. G. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, 60 (2004), 91-110.  doi: 10.1023/B:VISI.0000029664.99615.94.  Google Scholar [31] B. D. Lucas and T. Kanade, An iterative image registration technique with an application to stereo vision, Proceedings of the 7th International Joint Conference on Artificial Intelligence, 2, Morgan Publishers Inc., San Francisco, CA, 1981,674–679. Google Scholar [32] C. Mei, S. Benhimane, E. Malis and P. Rives, Efficient homography-based tracking and 3-D reconstruction for single-viewpoint sensors, IEEE Transactions on Robotics, 24 (2008), 1352-1364.  doi: 10.1109/TRO.2008.2007941.  Google Scholar [33] Y. Mi, K. Zheng and S. Wang, Recognizing actions in wearable-camera videos by training classifiers on fixed-camera videos, in Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, ICMR '18, Association for Computing Machinery, New York, NY, (2018), 169–177. doi: 10.1145/3206025.3206041.  Google Scholar [34] Y. Mi and S. Wang, Recognizing micro actions in videos: learning motion details via segment-level temporal pyramid, in 2019 IEEE International Conference on Multimedia and Expo, Shanghai, China, (2019), 1036-1041. doi: 10.1109/ICME.2019.00182.  Google Scholar [35] Y. Mi, X. Zhang, Z. Li and S. Wang, Dual-branch network with a subtle motion detector for microaction recognition in videos, in IEEE Transactions on Image Processing, 29 (2020), 6194-6208. doi: 10.1109/TIP.2020.2989864.  Google Scholar [36] K. Mikolajczyk and et al., A comparison of affine region detectors, Int. J. Comput. Vision, 65 (2005), 43-72.  doi: 10.1007/s11263-005-3848-x.  Google Scholar [37] M. Muja and D. G. Lowe, Fast approximate nearest neighbors with automatic algorithm configuration, in VISAPP International Conference on Computer Vision Theory and Applications, (2009), 331–340. Google Scholar [38] V. Nair and G. E. Hinton, Rectified linear units improve restricted Boltzmann machines, in Proceedings of ICML, 27, Haifa, Isael, (2010), 807–814. Google Scholar [39] T. Nguyen, S. W. Chen, S. S. Shivakumar, C. J. Taylor and V. Kumar, Unsupervised deep homography: A fast and robust homography estimation model, IEEE Robotics and Automation Letters, 3 (2018), 2346-2353.  doi: 10.1109/LRA.2018.2809549.  Google Scholar [40] F. E. Nowruzi, R. Laganiere and N. Japkowicz, Homography estimation from image pairs with hierarchical convolutional networks, in 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, (2017), 904–911. doi: 10.1109/ICCVW.2017.111.  Google Scholar [41] M. Ozuysal, M. Calonder, V. Lepetit and P. Fua, Fast keypoint recognition using random ferns, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32 (2010), 448-461.  doi: 10.1109/TPAMI.2009.23.  Google Scholar [42] E. Rublee, V. Rabaud, K. Konolige and G. Bradski, ORB: An efficient alternative to SIFT or SURF, in 2011 International Conference on Computer Vision, Barcelona, (2011), 2564–2571. doi: 10.1109/ICCV.2011.6126544.  Google Scholar [43] J. Shi and Tomasi, Good features to track, in 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, (1994), 593–600. doi: 10.1109/CVPR.1994.323794.  Google Scholar [44] K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, preprint, arXiv: 1409.1556. Google Scholar [45] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever and R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfitting, J. Mach. Learn. Res., 15 (2014), 1929-1958.   Google Scholar [46] D. Tao, Y. Guo, B. Yu, J. Pang and Z. Yu, Deep multi-view feature learning for person re-identification, IEEE Transactions on Circuits and Systems for Video Technology, 28 (2018), 2657-2666.  doi: 10.1109/TCSVT.2017.2726580.  Google Scholar [47] D. Teney and M. Hebert, Learning to extract motion from videos in convolutional neural networks, in Computer Vision – ACCV 2016, Lecture Notes in Computer Science, 10115, Springer, Cham, (2016), 412–428. doi: 10.1007/978-3-319-54193-8_26.  Google Scholar [48] X. Yang, X. Si, T. Xue, L. Zhang and K.-T. T. Cheng, Vision-inertial hybrid tracking for robust and efficient augmented reality on smartphones, in Proceedings of the 23rd ACM International Conference on Multimedia, MM '15, Association for Computing Machinery, New York, NY, (2015), 1039–1042. doi: 10.1145/2733373.2806396.  Google Scholar
Architecture of the proposed recurrent convolutional regression network.
Configuration of the CNN used in the proposed method.
The architecture of an LSTM cell.
Sample frames of the 4 points in a recorded video, with computed homographies.
An illustration of constructing a video sequence with ground-truth homographies.
Sample videos generated using MS-COCO images with ground-truth homographies. Each row shows frames of a sample video.
Comparison of the proposed method to the existing homography estimation methods on the synthesized video dataset.
Corner errors of the proposed method and the comparison methods over time on the synthesized video dataset.
Sample real-world videos with ground-truth homographies. Each row shows a video with an observed challenge.
Performance of the proposed method and the comparison methods on the real-world video dataset.
Corner errors of the proposed method and the comparison methods over time on the real-world video dataset.
Average corner error of the proposed method by using different numbers of LSTM cells.
 Method Number of LSTM Memory Cells Corner Error Error reduction Proposed 256 2.34 - 512 1.44 38.5% 1024 1.36 41.9% 2048 1.37 41.4%
 Method Number of LSTM Memory Cells Corner Error Error reduction Proposed 256 2.34 - 512 1.44 38.5% 1024 1.36 41.9% 2048 1.37 41.4%
Complexity analysis, where #Param. denotes the number of parameters and FLOPs denotes the number of floating-point operations.
 Method #Param. FLOPs HomographyNet Proposed 3.4M 68.4M 5.1M 85.2M
 Method #Param. FLOPs HomographyNet Proposed 3.4M 68.4M 5.1M 85.2M
Performance of the proposed method trained the "Original" dataset and the "Both" dataset, tested on the "Original", "Color variations", "Gaussian noise" and "Both" datasets.
 Original Color variations Gaussian noise Both modelorig 1.36 1.41 2.18 2.42 modelboth 1.79 1.88 1.65 1.69
 Original Color variations Gaussian noise Both modelorig 1.36 1.41 2.18 2.42 modelboth 1.79 1.88 1.65 1.69

Impact Factor: