May  2020, 3(2): 125-140. doi: 10.3934/mfc.2020014

Homography estimation along short videos by recurrent convolutional regression network

University of South Carolina, Columbia, 29208, USA

* Corresponding author: Song Wang

Received  December 2019 Revised  March 2020 Published  May 2020

Many moving-camera video processing and analysis tasks require accurate estimation of homography across frames. Estimating homography between non-adjacent frames can be very challenging when their camera view angles show large difference. In this paper, we propose a new deep-learning based method for homography estimation along videos by exploiting temporal dynamics across frames. More specifically, we develop a recurrent convolutional regression network consisting of convolutional neural network (CNN) and recurrent neural network (RNN) with long short-term memory (LSTM) cells, followed by a regression layer for estimating the parameters of homography. In the experiments, we evaluate the proposed method on both the synthesized and real-world short videos. The experimental results verify that the proposed method can estimate the homographies along short videos better than several existing methods.

Citation: Yang Mi, Kang Zheng, Song Wang. Homography estimation along short videos by recurrent convolutional regression network. Mathematical Foundations of Computing, 2020, 3 (2) : 125-140. doi: 10.3934/mfc.2020014
References:
[1]

S. Baker, A. Datta and T. Kanade, Parameterizing homographies, in Tech. Report, CMU-RI-TR-06-11, Robotics Institute, Carnegie Mellon University, (2006). Google Scholar

[2]

S. Baker and I. Matthews, Lucas-Kanade 20 years on: A unifying framework, International Journal of Computer Vision, 56 (2004), 221-255.  doi: 10.1023/B:VISI.0000011205.11775.fd.  Google Scholar

[3]

D. Barath and Z. Kukelova, Homography from two orientation- and scale-covariant features, in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), (2019), 1091–1099. doi: 10.1109/ICCV.2019.00118.  Google Scholar

[4]

D. Barath and J. Matas, Graph-cut RANSAC, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, (2018), 6733–6741. doi: 10.1109/TIP.2017.2704431.  Google Scholar

[5]

H. Bay, T. Tuytelaars and L. Van Gool, SURF: Speeded up robust features, in Computer Vision – ECCV 2006, Lecture Notes in Computer Science, 3951, Springer, Berlin, Heidelberg, (2006), 404–417. doi: 10.1007/11744023_32.  Google Scholar

[6]

S. Benhimane and E. Malis, Real-time image-based tracking of planes using efficient second-order minimization, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566), 1 (2004), 943-948.  doi: 10.1109/IROS.2004.1389474.  Google Scholar

[7]

G. Bradski, The OpenCV Library, Dr. Dobb's Journal of Software Tools. Google Scholar

[8]

C. Chang, C. Chou and E. Y. Chang, CLKN: Cascaded Lucas-Kanade networks for image alignment, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, (2017), 3777–3785. doi: 10.1109/CVPR.2017.402.  Google Scholar

[9]

F. Chhaya, D. Reddy, S. Upadhyay, V. Chari, M. Z. Zia and K. M. Krishna, Monocular reconstruction of vehicles: Combining SLAM with shape priors, in 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, (2016), 5758–5765. doi: 10.1109/ICRA.2016.7487799.  Google Scholar

[10]

F. Chollet et al., Keras, https://keras.io, 2015. Google Scholar

[11]

N. Dalal and B. Triggs, Histograms of oriented gradients for human detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 1 (2005), 886-893.  doi: 10.1109/CVPR.2005.177.  Google Scholar

[12]

D. DeTone, T. Malisiewicz and A. Rabinovich, Deep image homography estimation, preprint, arXiv: 1606.03798. Google Scholar

[13]

M. A. Fischler and R. C. Bolles, Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography, Comm. ACM, 24 (1981), 381-395.  doi: 10.1145/358669.358692.  Google Scholar

[14]

C. Forster, M. Pizzoli and D. Scaramuzza, SVO: Fast semi-direct monocular visual odometry, in 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, (2014), 15–22. doi: 10.1109/ICRA.2014.6906584.  Google Scholar

[15]

E. Garcia-Fidalgo, A. Ortiz, F. Bonnin-Pascual and J. P. Company, A mosaicing approach for vessel visual inspection using a micro-aerial vehicle, in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, (2015), 104–110. doi: 10.1109/IROS.2015.7353361.  Google Scholar

[16] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, 2$^nd$ edition, Cambridge University Press, Cambridge, 2003.   Google Scholar
[17]

K. He, X. Zhang, S. Ren and J. Sun, Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification, in 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, (2015), 1026–1034. doi: 10.1109/ICCV.2015.123.  Google Scholar

[18]

A. G. Howard, Some improvements on deep convolutional neural network based image classification, preprint, arXiv: 1312.5402. Google Scholar

[19]

Y.-F. Hsu, C.-C. Chou and M.-Y. Shih, Moving camera video stabilization using homography consistency, in 2012 19th IEEE International Conference on Image Processing, Orlando, FL, (2012), 2761–2764. doi: 10.1109/ICIP.2012.6467471.  Google Scholar

[20]

M.-D. Hua, T. Hamel, R. Mahony and G. Allibert, Explicit complementary observer design on special linear group SL(3) for homography estimation using conic correspondences, in 2017 IEEE 56th Annual Conference on Decision and Control (CDC), Melbourne, VIC, (2017), 2434–2441. doi: 10.1109/CDC.2017.8264006.  Google Scholar

[21]

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, in Proceedings of the 32 nd International Conference on Machine Learning, Lille, France, (2015), 448–456. Google Scholar

[22]

W. Jiang and J. Gu, Video stitching with spatial-temporal content-preserving warping, in 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Boston, MA, (2015), 42–48. doi: 10.1109/CVPRW.2015.7301374.  Google Scholar

[23]

F. Jurie and M. Dhome, Hyperplane approximation for template matching, IEEE Transactions on Pattern Analysis and Machine Intelligence, 24 (2002), 996-1000.  doi: 10.1109/TPAMI.2002.1017625.  Google Scholar

[24]

A. Kendall, M. Grimes and R. Cipolla, PoseNet: A convolutional network for real-time 6-DOF camera relocalization, in 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, (2015), 2938–2946. doi: 10.1109/ICCV.2015.336.  Google Scholar

[25]

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, preprint, arXiv: 1412.6980. Google Scholar

[26]

P. Liang, Y. Wu, H. Lu, L. Wang, C. Liao and H. Ling, Planar object tracking in the wild: A benchmark, in 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, (2018), 651–658. doi: 10.1109/ICRA.2018.8461037.  Google Scholar

[27]

T.-Y. Lin, et al., Microsoft COCO: Common objects in context, in Computer Vision – ECCV 2014, Lecture Notes in Computer Science, 8693, Springer, Cham, (2014), 740–755. doi: 10.1007/978-3-319-10602-1_48.  Google Scholar

[28]

S. Liu, L. Yuan, P. Tan and J. Sun, SteadyFlow: Spatially smooth optical flow for video stabilization, in 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, (2014), 4209–4216. doi: 10.1109/CVPR.2014.536.  Google Scholar

[29]

S. LiuJ. ChenC.-H. Chang and Y. Ai, A new accurate and fast homography computation algorithm for sports and traffic video analysis, IEEE Transactions on Circuits and Systems for Video Technology, 28 (2018), 2993-3006.  doi: 10.1109/TCSVT.2017.2731781.  Google Scholar

[30]

D. G. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, 60 (2004), 91-110.  doi: 10.1023/B:VISI.0000029664.99615.94.  Google Scholar

[31]

B. D. Lucas and T. Kanade, An iterative image registration technique with an application to stereo vision, Proceedings of the 7th International Joint Conference on Artificial Intelligence, 2, Morgan Publishers Inc., San Francisco, CA, 1981,674–679. Google Scholar

[32]

C. MeiS. BenhimaneE. Malis and P. Rives, Efficient homography-based tracking and 3-D reconstruction for single-viewpoint sensors, IEEE Transactions on Robotics, 24 (2008), 1352-1364.  doi: 10.1109/TRO.2008.2007941.  Google Scholar

[33]

Y. Mi, K. Zheng and S. Wang, Recognizing actions in wearable-camera videos by training classifiers on fixed-camera videos, in Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, ICMR '18, Association for Computing Machinery, New York, NY, (2018), 169–177. doi: 10.1145/3206025.3206041.  Google Scholar

[34]

Y. Mi and S. Wang, Recognizing micro actions in videos: learning motion details via segment-level temporal pyramid, in 2019 IEEE International Conference on Multimedia and Expo, Shanghai, China, (2019), 1036-1041. doi: 10.1109/ICME.2019.00182.  Google Scholar

[35]

Y. Mi, X. Zhang, Z. Li and S. Wang, Dual-branch network with a subtle motion detector for microaction recognition in videos, in IEEE Transactions on Image Processing, 29 (2020), 6194-6208. doi: 10.1109/TIP.2020.2989864.  Google Scholar

[36]

K. Mikolajczyk and et al., A comparison of affine region detectors, Int. J. Comput. Vision, 65 (2005), 43-72.  doi: 10.1007/s11263-005-3848-x.  Google Scholar

[37]

M. Muja and D. G. Lowe, Fast approximate nearest neighbors with automatic algorithm configuration, in VISAPP International Conference on Computer Vision Theory and Applications, (2009), 331–340. Google Scholar

[38]

V. Nair and G. E. Hinton, Rectified linear units improve restricted Boltzmann machines, in Proceedings of ICML, 27, Haifa, Isael, (2010), 807–814. Google Scholar

[39]

T. NguyenS. W. ChenS. S. ShivakumarC. J. Taylor and V. Kumar, Unsupervised deep homography: A fast and robust homography estimation model, IEEE Robotics and Automation Letters, 3 (2018), 2346-2353.  doi: 10.1109/LRA.2018.2809549.  Google Scholar

[40]

F. E. Nowruzi, R. Laganiere and N. Japkowicz, Homography estimation from image pairs with hierarchical convolutional networks, in 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, (2017), 904–911. doi: 10.1109/ICCVW.2017.111.  Google Scholar

[41]

M. OzuysalM. CalonderV. Lepetit and P. Fua, Fast keypoint recognition using random ferns, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32 (2010), 448-461.  doi: 10.1109/TPAMI.2009.23.  Google Scholar

[42]

E. Rublee, V. Rabaud, K. Konolige and G. Bradski, ORB: An efficient alternative to SIFT or SURF, in 2011 International Conference on Computer Vision, Barcelona, (2011), 2564–2571. doi: 10.1109/ICCV.2011.6126544.  Google Scholar

[43]

J. Shi and Tomasi, Good features to track, in 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, (1994), 593–600. doi: 10.1109/CVPR.1994.323794.  Google Scholar

[44]

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, preprint, arXiv: 1409.1556. Google Scholar

[45]

N. SrivastavaG. HintonA. KrizhevskyI. Sutskever and R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfitting, J. Mach. Learn. Res., 15 (2014), 1929-1958.   Google Scholar

[46]

D. TaoY. GuoB. YuJ. Pang and Z. Yu, Deep multi-view feature learning for person re-identification, IEEE Transactions on Circuits and Systems for Video Technology, 28 (2018), 2657-2666.  doi: 10.1109/TCSVT.2017.2726580.  Google Scholar

[47]

D. Teney and M. Hebert, Learning to extract motion from videos in convolutional neural networks, in Computer Vision – ACCV 2016, Lecture Notes in Computer Science, 10115, Springer, Cham, (2016), 412–428. doi: 10.1007/978-3-319-54193-8_26.  Google Scholar

[48]

X. Yang, X. Si, T. Xue, L. Zhang and K.-T. T. Cheng, Vision-inertial hybrid tracking for robust and efficient augmented reality on smartphones, in Proceedings of the 23rd ACM International Conference on Multimedia, MM '15, Association for Computing Machinery, New York, NY, (2015), 1039–1042. doi: 10.1145/2733373.2806396.  Google Scholar

show all references

References:
[1]

S. Baker, A. Datta and T. Kanade, Parameterizing homographies, in Tech. Report, CMU-RI-TR-06-11, Robotics Institute, Carnegie Mellon University, (2006). Google Scholar

[2]

S. Baker and I. Matthews, Lucas-Kanade 20 years on: A unifying framework, International Journal of Computer Vision, 56 (2004), 221-255.  doi: 10.1023/B:VISI.0000011205.11775.fd.  Google Scholar

[3]

D. Barath and Z. Kukelova, Homography from two orientation- and scale-covariant features, in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), (2019), 1091–1099. doi: 10.1109/ICCV.2019.00118.  Google Scholar

[4]

D. Barath and J. Matas, Graph-cut RANSAC, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, (2018), 6733–6741. doi: 10.1109/TIP.2017.2704431.  Google Scholar

[5]

H. Bay, T. Tuytelaars and L. Van Gool, SURF: Speeded up robust features, in Computer Vision – ECCV 2006, Lecture Notes in Computer Science, 3951, Springer, Berlin, Heidelberg, (2006), 404–417. doi: 10.1007/11744023_32.  Google Scholar

[6]

S. Benhimane and E. Malis, Real-time image-based tracking of planes using efficient second-order minimization, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566), 1 (2004), 943-948.  doi: 10.1109/IROS.2004.1389474.  Google Scholar

[7]

G. Bradski, The OpenCV Library, Dr. Dobb's Journal of Software Tools. Google Scholar

[8]

C. Chang, C. Chou and E. Y. Chang, CLKN: Cascaded Lucas-Kanade networks for image alignment, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, (2017), 3777–3785. doi: 10.1109/CVPR.2017.402.  Google Scholar

[9]

F. Chhaya, D. Reddy, S. Upadhyay, V. Chari, M. Z. Zia and K. M. Krishna, Monocular reconstruction of vehicles: Combining SLAM with shape priors, in 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, (2016), 5758–5765. doi: 10.1109/ICRA.2016.7487799.  Google Scholar

[10]

F. Chollet et al., Keras, https://keras.io, 2015. Google Scholar

[11]

N. Dalal and B. Triggs, Histograms of oriented gradients for human detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 1 (2005), 886-893.  doi: 10.1109/CVPR.2005.177.  Google Scholar

[12]

D. DeTone, T. Malisiewicz and A. Rabinovich, Deep image homography estimation, preprint, arXiv: 1606.03798. Google Scholar

[13]

M. A. Fischler and R. C. Bolles, Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography, Comm. ACM, 24 (1981), 381-395.  doi: 10.1145/358669.358692.  Google Scholar

[14]

C. Forster, M. Pizzoli and D. Scaramuzza, SVO: Fast semi-direct monocular visual odometry, in 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, (2014), 15–22. doi: 10.1109/ICRA.2014.6906584.  Google Scholar

[15]

E. Garcia-Fidalgo, A. Ortiz, F. Bonnin-Pascual and J. P. Company, A mosaicing approach for vessel visual inspection using a micro-aerial vehicle, in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, (2015), 104–110. doi: 10.1109/IROS.2015.7353361.  Google Scholar

[16] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, 2$^nd$ edition, Cambridge University Press, Cambridge, 2003.   Google Scholar
[17]

K. He, X. Zhang, S. Ren and J. Sun, Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification, in 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, (2015), 1026–1034. doi: 10.1109/ICCV.2015.123.  Google Scholar

[18]

A. G. Howard, Some improvements on deep convolutional neural network based image classification, preprint, arXiv: 1312.5402. Google Scholar

[19]

Y.-F. Hsu, C.-C. Chou and M.-Y. Shih, Moving camera video stabilization using homography consistency, in 2012 19th IEEE International Conference on Image Processing, Orlando, FL, (2012), 2761–2764. doi: 10.1109/ICIP.2012.6467471.  Google Scholar

[20]

M.-D. Hua, T. Hamel, R. Mahony and G. Allibert, Explicit complementary observer design on special linear group SL(3) for homography estimation using conic correspondences, in 2017 IEEE 56th Annual Conference on Decision and Control (CDC), Melbourne, VIC, (2017), 2434–2441. doi: 10.1109/CDC.2017.8264006.  Google Scholar

[21]

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, in Proceedings of the 32 nd International Conference on Machine Learning, Lille, France, (2015), 448–456. Google Scholar

[22]

W. Jiang and J. Gu, Video stitching with spatial-temporal content-preserving warping, in 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Boston, MA, (2015), 42–48. doi: 10.1109/CVPRW.2015.7301374.  Google Scholar

[23]

F. Jurie and M. Dhome, Hyperplane approximation for template matching, IEEE Transactions on Pattern Analysis and Machine Intelligence, 24 (2002), 996-1000.  doi: 10.1109/TPAMI.2002.1017625.  Google Scholar

[24]

A. Kendall, M. Grimes and R. Cipolla, PoseNet: A convolutional network for real-time 6-DOF camera relocalization, in 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, (2015), 2938–2946. doi: 10.1109/ICCV.2015.336.  Google Scholar

[25]

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, preprint, arXiv: 1412.6980. Google Scholar

[26]

P. Liang, Y. Wu, H. Lu, L. Wang, C. Liao and H. Ling, Planar object tracking in the wild: A benchmark, in 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, (2018), 651–658. doi: 10.1109/ICRA.2018.8461037.  Google Scholar

[27]

T.-Y. Lin, et al., Microsoft COCO: Common objects in context, in Computer Vision – ECCV 2014, Lecture Notes in Computer Science, 8693, Springer, Cham, (2014), 740–755. doi: 10.1007/978-3-319-10602-1_48.  Google Scholar

[28]

S. Liu, L. Yuan, P. Tan and J. Sun, SteadyFlow: Spatially smooth optical flow for video stabilization, in 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, (2014), 4209–4216. doi: 10.1109/CVPR.2014.536.  Google Scholar

[29]

S. LiuJ. ChenC.-H. Chang and Y. Ai, A new accurate and fast homography computation algorithm for sports and traffic video analysis, IEEE Transactions on Circuits and Systems for Video Technology, 28 (2018), 2993-3006.  doi: 10.1109/TCSVT.2017.2731781.  Google Scholar

[30]

D. G. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, 60 (2004), 91-110.  doi: 10.1023/B:VISI.0000029664.99615.94.  Google Scholar

[31]

B. D. Lucas and T. Kanade, An iterative image registration technique with an application to stereo vision, Proceedings of the 7th International Joint Conference on Artificial Intelligence, 2, Morgan Publishers Inc., San Francisco, CA, 1981,674–679. Google Scholar

[32]

C. MeiS. BenhimaneE. Malis and P. Rives, Efficient homography-based tracking and 3-D reconstruction for single-viewpoint sensors, IEEE Transactions on Robotics, 24 (2008), 1352-1364.  doi: 10.1109/TRO.2008.2007941.  Google Scholar

[33]

Y. Mi, K. Zheng and S. Wang, Recognizing actions in wearable-camera videos by training classifiers on fixed-camera videos, in Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, ICMR '18, Association for Computing Machinery, New York, NY, (2018), 169–177. doi: 10.1145/3206025.3206041.  Google Scholar

[34]

Y. Mi and S. Wang, Recognizing micro actions in videos: learning motion details via segment-level temporal pyramid, in 2019 IEEE International Conference on Multimedia and Expo, Shanghai, China, (2019), 1036-1041. doi: 10.1109/ICME.2019.00182.  Google Scholar

[35]

Y. Mi, X. Zhang, Z. Li and S. Wang, Dual-branch network with a subtle motion detector for microaction recognition in videos, in IEEE Transactions on Image Processing, 29 (2020), 6194-6208. doi: 10.1109/TIP.2020.2989864.  Google Scholar

[36]

K. Mikolajczyk and et al., A comparison of affine region detectors, Int. J. Comput. Vision, 65 (2005), 43-72.  doi: 10.1007/s11263-005-3848-x.  Google Scholar

[37]

M. Muja and D. G. Lowe, Fast approximate nearest neighbors with automatic algorithm configuration, in VISAPP International Conference on Computer Vision Theory and Applications, (2009), 331–340. Google Scholar

[38]

V. Nair and G. E. Hinton, Rectified linear units improve restricted Boltzmann machines, in Proceedings of ICML, 27, Haifa, Isael, (2010), 807–814. Google Scholar

[39]

T. NguyenS. W. ChenS. S. ShivakumarC. J. Taylor and V. Kumar, Unsupervised deep homography: A fast and robust homography estimation model, IEEE Robotics and Automation Letters, 3 (2018), 2346-2353.  doi: 10.1109/LRA.2018.2809549.  Google Scholar

[40]

F. E. Nowruzi, R. Laganiere and N. Japkowicz, Homography estimation from image pairs with hierarchical convolutional networks, in 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, (2017), 904–911. doi: 10.1109/ICCVW.2017.111.  Google Scholar

[41]

M. OzuysalM. CalonderV. Lepetit and P. Fua, Fast keypoint recognition using random ferns, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32 (2010), 448-461.  doi: 10.1109/TPAMI.2009.23.  Google Scholar

[42]

E. Rublee, V. Rabaud, K. Konolige and G. Bradski, ORB: An efficient alternative to SIFT or SURF, in 2011 International Conference on Computer Vision, Barcelona, (2011), 2564–2571. doi: 10.1109/ICCV.2011.6126544.  Google Scholar

[43]

J. Shi and Tomasi, Good features to track, in 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, (1994), 593–600. doi: 10.1109/CVPR.1994.323794.  Google Scholar

[44]

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, preprint, arXiv: 1409.1556. Google Scholar

[45]

N. SrivastavaG. HintonA. KrizhevskyI. Sutskever and R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfitting, J. Mach. Learn. Res., 15 (2014), 1929-1958.   Google Scholar

[46]

D. TaoY. GuoB. YuJ. Pang and Z. Yu, Deep multi-view feature learning for person re-identification, IEEE Transactions on Circuits and Systems for Video Technology, 28 (2018), 2657-2666.  doi: 10.1109/TCSVT.2017.2726580.  Google Scholar

[47]

D. Teney and M. Hebert, Learning to extract motion from videos in convolutional neural networks, in Computer Vision – ACCV 2016, Lecture Notes in Computer Science, 10115, Springer, Cham, (2016), 412–428. doi: 10.1007/978-3-319-54193-8_26.  Google Scholar

[48]

X. Yang, X. Si, T. Xue, L. Zhang and K.-T. T. Cheng, Vision-inertial hybrid tracking for robust and efficient augmented reality on smartphones, in Proceedings of the 23rd ACM International Conference on Multimedia, MM '15, Association for Computing Machinery, New York, NY, (2015), 1039–1042. doi: 10.1145/2733373.2806396.  Google Scholar

Figure 1.  Architecture of the proposed recurrent convolutional regression network.
Figure 2.  Configuration of the CNN used in the proposed method.
Figure 3.  The architecture of an LSTM cell.
Figure 4.  Sample frames of the 4 points in a recorded video, with computed homographies.
Figure 5.  An illustration of constructing a video sequence with ground-truth homographies.
Figure 6.  Sample videos generated using MS-COCO images with ground-truth homographies. Each row shows frames of a sample video.
Figure 7.  Comparison of the proposed method to the existing homography estimation methods on the synthesized video dataset.
Figure 8.  Corner errors of the proposed method and the comparison methods over time on the synthesized video dataset.
Figure 9.  Sample real-world videos with ground-truth homographies. Each row shows a video with an observed challenge.
Figure 10.  Performance of the proposed method and the comparison methods on the real-world video dataset.
Figure 11.  Corner errors of the proposed method and the comparison methods over time on the real-world video dataset.
Table 1.  Average corner error of the proposed method by using different numbers of LSTM cells.
Method Number of LSTM Memory Cells Corner Error Error reduction
Proposed 256 2.34 -
512 1.44 38.5%
1024 1.36 41.9%
2048 1.37 41.4%
Method Number of LSTM Memory Cells Corner Error Error reduction
Proposed 256 2.34 -
512 1.44 38.5%
1024 1.36 41.9%
2048 1.37 41.4%
Table 2.  Complexity analysis, where #Param. denotes the number of parameters and FLOPs denotes the number of floating-point operations.
Method #Param. FLOPs
HomographyNet Proposed 3.4M 68.4M
5.1M 85.2M
Method #Param. FLOPs
HomographyNet Proposed 3.4M 68.4M
5.1M 85.2M
Table 3.  Performance of the proposed method trained the "Original" dataset and the "Both" dataset, tested on the "Original", "Color variations", "Gaussian noise" and "Both" datasets.
Original Color variations Gaussian noise Both
modelorig 1.36 1.41 2.18 2.42
modelboth 1.79 1.88 1.65 1.69
Original Color variations Gaussian noise Both
modelorig 1.36 1.41 2.18 2.42
modelboth 1.79 1.88 1.65 1.69
[1]

Zhuwei Qin, Fuxun Yu, Chenchen Liu, Xiang Chen. How convolutional neural networks see the world --- A survey of convolutional neural network visualization methods. Mathematical Foundations of Computing, 2018, 1 (2) : 149-180. doi: 10.3934/mfc.2018008

[2]

Editorial Office. Retraction: Honggang Yu, An efficient face recognition algorithm using the improved convolutional neural network. Discrete & Continuous Dynamical Systems - S, 2019, 12 (4&5) : 901-901. doi: 10.3934/dcdss.2019060

[3]

Yuantian Xia, Juxiang Zhou, Tianwei Xu, Wei Gao. An improved deep convolutional neural network model with kernel loss function in image classification. Mathematical Foundations of Computing, 2020, 3 (1) : 51-64. doi: 10.3934/mfc.2020005

[4]

Lixin Xu, Wanquan Liu. A new recurrent neural network adaptive approach for host-gate way rate control protocol within intranets using ATM ABR service. Journal of Industrial & Management Optimization, 2005, 1 (3) : 389-404. doi: 10.3934/jimo.2005.1.389

[5]

Sergio Estrada, J. R. García-Rozas, Justo Peralta, E. Sánchez-García. Group convolutional codes. Advances in Mathematics of Communications, 2008, 2 (1) : 83-94. doi: 10.3934/amc.2008.2.83

[6]

José Ignacio Iglesias Curto. Generalized AG convolutional codes. Advances in Mathematics of Communications, 2009, 3 (4) : 317-328. doi: 10.3934/amc.2009.3.317

[7]

Heide Gluesing-Luerssen. On isometries for convolutional codes. Advances in Mathematics of Communications, 2009, 3 (2) : 179-203. doi: 10.3934/amc.2009.3.179

[8]

Juan Campos, Rafael Obaya, Massimo Tarallo. Recurrent equations with sign and Fredholm alternative. Discrete & Continuous Dynamical Systems - S, 2016, 9 (4) : 959-977. doi: 10.3934/dcdss.2016036

[9]

Miguel Abadi, Sandro Vaienti. Large deviations for short recurrence. Discrete & Continuous Dynamical Systems - A, 2008, 21 (3) : 729-747. doi: 10.3934/dcds.2008.21.729

[10]

B. Spagnolo, D. Valenti, A. Fiasconaro. Noise in ecosystems: A short review. Mathematical Biosciences & Engineering, 2004, 1 (1) : 185-211. doi: 10.3934/mbe.2004.1.185

[11]

Heide Gluesing-Luerssen, Fai-Lung Tsang. A matrix ring description for cyclic convolutional codes. Advances in Mathematics of Communications, 2008, 2 (1) : 55-81. doi: 10.3934/amc.2008.2.55

[12]

Heide Gluesing-Luerssen, Uwe Helmke, José Ignacio Iglesias Curto. Algebraic decoding for doubly cyclic convolutional codes. Advances in Mathematics of Communications, 2010, 4 (1) : 83-99. doi: 10.3934/amc.2010.4.83

[13]

Bin Chen, Xiongping Dai. On uniformly recurrent motions of topological semigroup actions. Discrete & Continuous Dynamical Systems - A, 2016, 36 (6) : 2931-2944. doi: 10.3934/dcds.2016.36.2931

[14]

Yuanhong Chen, Chao Ma, Jun Wu. Moving recurrent properties for the doubling map on the unit interval. Discrete & Continuous Dynamical Systems - A, 2016, 36 (6) : 2969-2979. doi: 10.3934/dcds.2016.36.2969

[15]

Ronald A. Knight. Compact minimal sets in continuous recurrent flows. Conference Publications, 1998, 1998 (Special) : 397-407. doi: 10.3934/proc.1998.1998.397

[16]

Vena Pearl Bongolan-walsh, David Cheban, Jinqiao Duan. Recurrent motions in the nonautonomous Navier-Stokes system. Discrete & Continuous Dynamical Systems - B, 2003, 3 (2) : 255-262. doi: 10.3934/dcdsb.2003.3.255

[17]

Gregory M. Zaverucha, Douglas R. Stinson. Short one-time signatures. Advances in Mathematics of Communications, 2011, 5 (3) : 473-488. doi: 10.3934/amc.2011.5.473

[18]

Shaoyong Lai, Qichang Xie. A selection problem for a constrained linear regression model. Journal of Industrial & Management Optimization, 2008, 4 (4) : 757-766. doi: 10.3934/jimo.2008.4.757

[19]

Adil Bagirov, Sona Taheri, Soodabeh Asadi. A difference of convex optimization algorithm for piecewise linear regression. Journal of Industrial & Management Optimization, 2019, 15 (2) : 909-932. doi: 10.3934/jimo.2018077

[20]

Shuhua Wang, Zhenlong Chen, Baohuai Sheng. Convergence of online pairwise regression learning with quadratic loss. Communications on Pure & Applied Analysis, 2020, 19 (8) : 4023-4054. doi: 10.3934/cpaa.2020178

 Impact Factor: 

Metrics

  • PDF downloads (26)
  • HTML views (100)
  • Cited by (0)

Other articles
by authors

[Back to Top]