doi: 10.3934/dcds.2021039

PDEs on graphs for semi-supervised learning applied to first-person activity recognition in body-worn video

1. 

University of California, Los Angeles, Department of Mathematics, 520 Portola Plaza, Box 951555, Los Angeles, CA 90095-1555, USA

2. 

California Polytechnic State University, BioResource and Agricultural Engineering Department, BRAE 8-101, 1 Grand Ave, San Luis Obispo, CA 93407, USA

3. 

University of California, Los Angeles Department of Mathematics and Mechanical and Aerospace Engineering, 420 Westwood Plaza, Box 951555, Los Angeles, CA 90095-1555, USA

4. 

University of California, Los Angeles, Department of Anthropology, 375 Portola Plaza, 341 Haines Hall, Box 951553, Los Angeles, CA 90095-1555, USA

* Corresponding author: Andrea L. Bertozzi

Received  March 2020 Revised  November 2020 Published  March 2021

Fund Project: This work was supported by NIJ grant 2014-R2-CX-0101, NSF grant DMS-1737770, and NSF grant DMS-1952339. The first two authors contributed equally to this work

This paper showcases the use of PDE-based graph methods for modern machine learning applications. We consider a case study of body-worn video classification because of the large volume of data and the lack of training data due to sensitivity of the information. Many modern artificial intelligence methods are turning to deep learning which typically requires a lot of training data to be effective. They can also suffer from issues of trust because the heavy use of training data can inadvertently provide information about details of the training images and could compromise privacy. Our alternate approach is a physics-based machine learning that uses classical ideas like optical flow for video analysis paired with linear mixture models such as non-negative matrix factorization along with PDE-based graph classification methods that parallel geometric equations from PDE such as motion by mean curvature. The upshot is a methodology that can work well on video with modest amounts of training data and that can also be used to compress the information about the video scene so that no personal information is contained in the compressed data, making it possible to provide a larger group of people access to these compressed data without compromising privacy. The compressed data retains information about the wearer of the camera while discarding information about people, objects, and places in the scene.

Citation: Hao Li, Honglin Chen, Matt Haberland, Andrea L. Bertozzi, P. Jeffrey Brantingham. PDEs on graphs for semi-supervised learning applied to first-person activity recognition in body-worn video. Discrete & Continuous Dynamical Systems, doi: 10.3934/dcds.2021039
References:
[1]

G. Abebe and A. Cavallaro, A long short-term memory convolutional neural network for first-person vision activity recognition, in Proceedings of the IEEE International Conference on Computer Vision, 2017, 1339–1346. doi: 10.1109/ICCVW.2017.159.  Google Scholar

[2]

K. Aizawa, K. Ishijima and M. Shiina, Summarizing wearable video, in Proceedings to 2001 International Conference on Image Processing, vol. 3, IEEE, 2001,398–401. doi: 10.1109/ICIP.2001.958135.  Google Scholar

[3]

J. L. BarronD. J. Fleet and S. S. Beauchemin, Performance of optical flow techniques, International Journal of Computer Vision, 12 (1994), 43-77.   Google Scholar

[4]

A. L. Bertozzi and A. Flenner, Diffuse interface models on graphs for classification of high dimensional data, SIAM Review, 58 (2016), 293-328.  doi: 10.1137/16M1070426.  Google Scholar

[5]

A. L. BertozziX. LuoA. M. Stuart and K. C. Zygalakis, Uncertainty quantification in graph-based classification of high dimensional data, SIAM/ASA Journal on Uncertainty Quantification, 6 (2018), 568-595.  doi: 10.1137/17M1134214.  Google Scholar

[6]

B. L. Bhatnagar, S. Singh, C. Arora, C. Jawahar and K. CVIT, Unsupervised learning of deep feature representation for clustering egocentric actions, in IJCAI, 2017, 1447–1453. doi: 10.24963/ijcai.2017/200.  Google Scholar

[7]

J. Budd and Y. V. Gennip, Graph Merriman–Bence–Osher as a semi-discrete implicit euler scheme for graph Allen–Cahn flow, SIAM Journal on Mathematical Analysis, 52 (2020), 4101-4139.  doi: 10.1137/19M1277394.  Google Scholar

[8]

T. F. Chan and L. A. Vese, Active contours without edges, IEEE Transactions on image processing, 10 (2001), 266-277.  doi: 10.1109/83.902291.  Google Scholar

[9]

A. G. del MolinoC. TanJ.-H. Lim and A.-H. Tan, Summarization of egocentric videos: a comprehensive survey, IEEE Transactions on Human-Machine Systems, 47 (2017), 65-76.  doi: 10.1109/THMS.2016.2623480.  Google Scholar

[10]

G. Farnebäck, Two-frame motion estimation based on polynomial expansion, in Scandinavian Conference on Image Analysis, Springer, 2003,363–370. Google Scholar

[11]

A. Fathi, J. K. Hodgins and J. M. Rehg, Social interactions: A first-person perspective,, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012, 1226–1233. doi: 10.1109/CVPR.2012.6247805.  Google Scholar

[12]

A. Fathi, A. Farhadi and J. M. Rehg, Understanding egocentric activities, in Computer Vision (ICCV), 2011 IEEE International Conference on, IEEE, 2011,407–414. doi: 10.1109/ICCV.2011.6126269.  Google Scholar

[13]

A. Fathi, Y. Li and J. M. Rehg, Learning to recognize daily actions using gaze, in European Conference on Computer Vision, Springer, 2012,314–327. doi: 10.1007/978-3-642-33718-5_23.  Google Scholar

[14]

D. FortunP. Bouthemy and C. Kervrann, Optical flow modeling and computation: A survey, Computer Vision and Image Understanding, 134 (2015), 1-21.  doi: 10.1016/j.cviu.2015.02.008.  Google Scholar

[15]

C. FowlkesS. BelongieF. Chung and J. Malik, Spectral grouping using the Nyström method, IEEE Transactions on Pattern Analysis and Machine Intelligence, 26 (2004), 214-225.  doi: 10.1109/TPAMI.2004.1262185.  Google Scholar

[16]

C. Garcia-CardonaE. MerkurjevA. L. BertozziA. Flenner and A. G. Percus, Multiclass data segmentation using diffuse interface methods on graphs, IEEE Transactions on Pattern Analysis and Machine Intelligence, 36 (2014), 1600-1613.  doi: 10.1109/TPAMI.2014.2300478.  Google Scholar

[17]

G. Gilboa and S. Osher, Nonlocal operators with applications to image processing, Multiscale Modeling & Simulation, 7 (2008), 1005-1028.  doi: 10.1137/070698592.  Google Scholar

[18]

B. K. Horn and B. G. Schunck, Determining optical flow, Artificial Intelligence, 17 (1981), 185-203.  doi: 10.1016/0004-3702(81)90024-2.  Google Scholar

[19]

G. Iyer, J. Chanussot and A. L. Bertozzi, A graph-based approach for feature extraction and segmentation of multimodal images,, in 2017 IEEE International Conference on Image Processing (ICIP), IEEE, 2017, 3320–3324. doi: 10.1109/ICIP.2017.8296897.  Google Scholar

[20]

M. JacobsE. Merkurjev and S. Esedoglu, Auction dynamics: A volume constrained MBO scheme, Journal of Computational Physics, 354 (2018), 288-310.  doi: 10.1016/j.jcp.2017.10.036.  Google Scholar

[21]

K. M. Kitani, T. Okabe, Y. Sato and A. Sugimoto, Fast unsupervised ego-action learning for first-person sports videos,, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011, 3241–3248. doi: 10.1109/CVPR.2011.5995406.  Google Scholar

[22]

C. L. Lawson and R. J. Hanson, Solving Least Squares Problems, SIAM, Philadelphia, PA, 1995. doi: 10.1137/1.9781611971217.  Google Scholar

[23]

D. D. Lee and H. S. Seung, Algorithms for non-negative matrix factorization, in Advances in Neural Information Processing Systems, 2001,556–562. Google Scholar

[24]

Y. Li, Z. Ye and J. M. Rehg, Delving into egocentric actions, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015,287–295. doi: 10.1109/CVPR.2015.7298625.  Google Scholar

[25]

B. D. Lucas and T. Kanade, An iterative image registration technique with an application to stereo vision, in Proceedings of the 1981 DARPA Image Understanding Workshop, 1981,121–130. Google Scholar

[26]

X. Luo and A. L. Bertozzi, Convergence of the graph Allen–Cahn scheme, Journal of Statistical Physics, 167 (2017), 934-958.  doi: 10.1007/s10955-017-1772-4.  Google Scholar

[27]

M. Ma, H. Fan and K. M. Kitani, Going deeper into first-person activity recognition,, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, 1894–1903. doi: 10.1109/CVPR.2016.209.  Google Scholar

[28]

Z. Meng, A. Koniges, Y. H. He, S. Williams, T. Kurth, B. Cook, J. Deslippe and A. L. Bertozzi, OpenMP parallelization and optimization of graph-based machine learning algorithms,, in International Workshop on OpenMP, Springer, 2016, 17–31. doi: 10.1007/978-3-319-45550-1_2.  Google Scholar

[29]

Z. MengE. MerkurjevA. Koniges and A. L. Bertozzi, Hyperspectral image classification using graph clustering methods, Image Processing On Line, 7 (2017), 218-245.  doi: 10.5201/ipol.2017.204.  Google Scholar

[30]

Z. Meng, J. Sánchez, J.-M. Morel, A. L. Bertozzi and P. J. Brantingham, Ego-motion classification for body-worn videos,, in Imaging, Vision and Learning Based on Optimization and PDEs (eds. X.-C. Tai, E. Bae and M. Lysaker), Springer International Publishing, Cham, 2018,221–239. doi: 10.1007/978-3-319-91274-5_10.  Google Scholar

[31]

E. MerkurjevC. Garcia-CardonaA. L. BertozziA. Flenner and A. G. Percus, Diffuse interface methods for multiclass segmentation of high-dimensional data, Applied Mathematics Letters, 33 (2014), 29-34.  doi: 10.1016/j.aml.2014.02.008.  Google Scholar

[32]

E. MerkurjevT. Kostic and A. L. Bertozzi, An MBO scheme on graphs for classification and image processing, SIAM Journal on Imaging Sciences, 6 (2013), 1903-1930.  doi: 10.1137/120886935.  Google Scholar

[33]

E. Merkurjev, J. Sunu and A. L. Bertozzi, Graph MBO method for multiclass segmentation of hyperspectral stand-off detection video,, in Image Processing (ICIP), 2014 IEEE International Conference on, IEEE, 2014,689–693. doi: 10.1109/ICIP.2014.7025138.  Google Scholar

[34]

F. Özkan, M. A. Arabaci, E. Surer and A. Temizel, Boosted multiple kernel learning for first-person activity recognition, in Signal Processing Conference (EUSIPCO), 2017 25th European, IEEE, 2017, 1050–1054. Google Scholar

[35]

H. Pirsiavash and D. Ramanan, Detecting activities of daily living in first-person camera views,, in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE, 2012, 2847–2854. doi: 10.1109/CVPR.2012.6248010.  Google Scholar

[36]

Y. Poleg, C. Arora and S. Peleg, Temporal segmentation of egocentric videos,, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, 2537–2544. doi: 10.1109/CVPR.2014.325.  Google Scholar

[37]

Y. Poleg, A. Ephrat, S. Peleg and C. Arora, Compact CNN for indexing egocentric videos, in Applications of Computer Vision (WACV), 2016 IEEE Winter Conference on, IEEE, 2016, 1–9. Google Scholar

[38]

L. I. RudinS. Osher and E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D: Nonlinear Phenomena, 60 (1992), 259-268.  doi: 10.1016/0167-2789(92)90242-F.  Google Scholar

[39]

M. S. Ryoo and L. Matthies, First-person activity recognition: What are they doing to me?, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013, 2730–2737. Google Scholar

[40]

M. S. Ryoo, B. Rothrock and L. Matthies, Pooled motion features for first-person videos,, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015,896–904. doi: 10.1109/CVPR.2015.7298691.  Google Scholar

[41]

S. SinghC. Arora and C. Jawahar, Trajectory aligned features for first person action recognition, Pattern Recognition, 62 (2017), 45-55.  doi: 10.1016/j.patcog.2016.07.031.  Google Scholar

[42]

E. H. Spriggs, F. De La Torre and M. Hebert, Temporal segmentation and activity classification from first-person sensing,, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, 17–24. doi: 10.1109/CVPRW.2009.5204354.  Google Scholar

[43]

D. Tran, L. Bourdev, R. Fergus, L. Torresani and M. Paluri, Learning spatiotemporal features with 3D convolutional networks, in Computer Vision (ICCV), 2015 IEEE International Conference on, IEEE, 2015, 4489–4497. Google Scholar

[44]

Y. Van Gennip and A. L. Bertozzi et al., $\Gamma$-convergence of graph Ginzburg-Landau functionals, Advances in Differential Equations, 17 (2012), 1115-1180.   Google Scholar

[45]

Y. Van GennipN. GuillenB. Osting and A. L. Bertozzi, Mean curvature, threshold dynamics, and phase field theory on finite graphs, Milan Journal of Mathematics, 82 (2014), 3-65.  doi: 10.1007/s00032-014-0216-8.  Google Scholar

[46]

X. WangL. GaoJ. SongX. ZhenN. Sebe and H. T. Shen, Deep appearance and motion learning for egocentric activity recognition, Neurocomputing, 275 (2018), 438-447.  doi: 10.1016/j.neucom.2017.08.063.  Google Scholar

[47]

L. Zelnik-Manor and P. Perona, Self-tuning spectral clustering, in Advances in Neural Information Processing Systems, 2005, 1601–1608. Google Scholar

[48]

W. ZhuV. ChayesA. TiardS. SanchezD. DahlbergA. L. BertozziS. OsherD. Zosso and D. Kuang, Unsupervised classification in hyperspectral imagery with nonlocal total variation and primal-dual hybrid gradient algorithm, IEEE Transactions on Geoscience and Remote Sensing, 55 (2017), 2786-2798.  doi: 10.1109/TGRS.2017.2654486.  Google Scholar

show all references

References:
[1]

G. Abebe and A. Cavallaro, A long short-term memory convolutional neural network for first-person vision activity recognition, in Proceedings of the IEEE International Conference on Computer Vision, 2017, 1339–1346. doi: 10.1109/ICCVW.2017.159.  Google Scholar

[2]

K. Aizawa, K. Ishijima and M. Shiina, Summarizing wearable video, in Proceedings to 2001 International Conference on Image Processing, vol. 3, IEEE, 2001,398–401. doi: 10.1109/ICIP.2001.958135.  Google Scholar

[3]

J. L. BarronD. J. Fleet and S. S. Beauchemin, Performance of optical flow techniques, International Journal of Computer Vision, 12 (1994), 43-77.   Google Scholar

[4]

A. L. Bertozzi and A. Flenner, Diffuse interface models on graphs for classification of high dimensional data, SIAM Review, 58 (2016), 293-328.  doi: 10.1137/16M1070426.  Google Scholar

[5]

A. L. BertozziX. LuoA. M. Stuart and K. C. Zygalakis, Uncertainty quantification in graph-based classification of high dimensional data, SIAM/ASA Journal on Uncertainty Quantification, 6 (2018), 568-595.  doi: 10.1137/17M1134214.  Google Scholar

[6]

B. L. Bhatnagar, S. Singh, C. Arora, C. Jawahar and K. CVIT, Unsupervised learning of deep feature representation for clustering egocentric actions, in IJCAI, 2017, 1447–1453. doi: 10.24963/ijcai.2017/200.  Google Scholar

[7]

J. Budd and Y. V. Gennip, Graph Merriman–Bence–Osher as a semi-discrete implicit euler scheme for graph Allen–Cahn flow, SIAM Journal on Mathematical Analysis, 52 (2020), 4101-4139.  doi: 10.1137/19M1277394.  Google Scholar

[8]

T. F. Chan and L. A. Vese, Active contours without edges, IEEE Transactions on image processing, 10 (2001), 266-277.  doi: 10.1109/83.902291.  Google Scholar

[9]

A. G. del MolinoC. TanJ.-H. Lim and A.-H. Tan, Summarization of egocentric videos: a comprehensive survey, IEEE Transactions on Human-Machine Systems, 47 (2017), 65-76.  doi: 10.1109/THMS.2016.2623480.  Google Scholar

[10]

G. Farnebäck, Two-frame motion estimation based on polynomial expansion, in Scandinavian Conference on Image Analysis, Springer, 2003,363–370. Google Scholar

[11]

A. Fathi, J. K. Hodgins and J. M. Rehg, Social interactions: A first-person perspective,, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012, 1226–1233. doi: 10.1109/CVPR.2012.6247805.  Google Scholar

[12]

A. Fathi, A. Farhadi and J. M. Rehg, Understanding egocentric activities, in Computer Vision (ICCV), 2011 IEEE International Conference on, IEEE, 2011,407–414. doi: 10.1109/ICCV.2011.6126269.  Google Scholar

[13]

A. Fathi, Y. Li and J. M. Rehg, Learning to recognize daily actions using gaze, in European Conference on Computer Vision, Springer, 2012,314–327. doi: 10.1007/978-3-642-33718-5_23.  Google Scholar

[14]

D. FortunP. Bouthemy and C. Kervrann, Optical flow modeling and computation: A survey, Computer Vision and Image Understanding, 134 (2015), 1-21.  doi: 10.1016/j.cviu.2015.02.008.  Google Scholar

[15]

C. FowlkesS. BelongieF. Chung and J. Malik, Spectral grouping using the Nyström method, IEEE Transactions on Pattern Analysis and Machine Intelligence, 26 (2004), 214-225.  doi: 10.1109/TPAMI.2004.1262185.  Google Scholar

[16]

C. Garcia-CardonaE. MerkurjevA. L. BertozziA. Flenner and A. G. Percus, Multiclass data segmentation using diffuse interface methods on graphs, IEEE Transactions on Pattern Analysis and Machine Intelligence, 36 (2014), 1600-1613.  doi: 10.1109/TPAMI.2014.2300478.  Google Scholar

[17]

G. Gilboa and S. Osher, Nonlocal operators with applications to image processing, Multiscale Modeling & Simulation, 7 (2008), 1005-1028.  doi: 10.1137/070698592.  Google Scholar

[18]

B. K. Horn and B. G. Schunck, Determining optical flow, Artificial Intelligence, 17 (1981), 185-203.  doi: 10.1016/0004-3702(81)90024-2.  Google Scholar

[19]

G. Iyer, J. Chanussot and A. L. Bertozzi, A graph-based approach for feature extraction and segmentation of multimodal images,, in 2017 IEEE International Conference on Image Processing (ICIP), IEEE, 2017, 3320–3324. doi: 10.1109/ICIP.2017.8296897.  Google Scholar

[20]

M. JacobsE. Merkurjev and S. Esedoglu, Auction dynamics: A volume constrained MBO scheme, Journal of Computational Physics, 354 (2018), 288-310.  doi: 10.1016/j.jcp.2017.10.036.  Google Scholar

[21]

K. M. Kitani, T. Okabe, Y. Sato and A. Sugimoto, Fast unsupervised ego-action learning for first-person sports videos,, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011, 3241–3248. doi: 10.1109/CVPR.2011.5995406.  Google Scholar

[22]

C. L. Lawson and R. J. Hanson, Solving Least Squares Problems, SIAM, Philadelphia, PA, 1995. doi: 10.1137/1.9781611971217.  Google Scholar

[23]

D. D. Lee and H. S. Seung, Algorithms for non-negative matrix factorization, in Advances in Neural Information Processing Systems, 2001,556–562. Google Scholar

[24]

Y. Li, Z. Ye and J. M. Rehg, Delving into egocentric actions, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015,287–295. doi: 10.1109/CVPR.2015.7298625.  Google Scholar

[25]

B. D. Lucas and T. Kanade, An iterative image registration technique with an application to stereo vision, in Proceedings of the 1981 DARPA Image Understanding Workshop, 1981,121–130. Google Scholar

[26]

X. Luo and A. L. Bertozzi, Convergence of the graph Allen–Cahn scheme, Journal of Statistical Physics, 167 (2017), 934-958.  doi: 10.1007/s10955-017-1772-4.  Google Scholar

[27]

M. Ma, H. Fan and K. M. Kitani, Going deeper into first-person activity recognition,, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, 1894–1903. doi: 10.1109/CVPR.2016.209.  Google Scholar

[28]

Z. Meng, A. Koniges, Y. H. He, S. Williams, T. Kurth, B. Cook, J. Deslippe and A. L. Bertozzi, OpenMP parallelization and optimization of graph-based machine learning algorithms,, in International Workshop on OpenMP, Springer, 2016, 17–31. doi: 10.1007/978-3-319-45550-1_2.  Google Scholar

[29]

Z. MengE. MerkurjevA. Koniges and A. L. Bertozzi, Hyperspectral image classification using graph clustering methods, Image Processing On Line, 7 (2017), 218-245.  doi: 10.5201/ipol.2017.204.  Google Scholar

[30]

Z. Meng, J. Sánchez, J.-M. Morel, A. L. Bertozzi and P. J. Brantingham, Ego-motion classification for body-worn videos,, in Imaging, Vision and Learning Based on Optimization and PDEs (eds. X.-C. Tai, E. Bae and M. Lysaker), Springer International Publishing, Cham, 2018,221–239. doi: 10.1007/978-3-319-91274-5_10.  Google Scholar

[31]

E. MerkurjevC. Garcia-CardonaA. L. BertozziA. Flenner and A. G. Percus, Diffuse interface methods for multiclass segmentation of high-dimensional data, Applied Mathematics Letters, 33 (2014), 29-34.  doi: 10.1016/j.aml.2014.02.008.  Google Scholar

[32]

E. MerkurjevT. Kostic and A. L. Bertozzi, An MBO scheme on graphs for classification and image processing, SIAM Journal on Imaging Sciences, 6 (2013), 1903-1930.  doi: 10.1137/120886935.  Google Scholar

[33]

E. Merkurjev, J. Sunu and A. L. Bertozzi, Graph MBO method for multiclass segmentation of hyperspectral stand-off detection video,, in Image Processing (ICIP), 2014 IEEE International Conference on, IEEE, 2014,689–693. doi: 10.1109/ICIP.2014.7025138.  Google Scholar

[34]

F. Özkan, M. A. Arabaci, E. Surer and A. Temizel, Boosted multiple kernel learning for first-person activity recognition, in Signal Processing Conference (EUSIPCO), 2017 25th European, IEEE, 2017, 1050–1054. Google Scholar

[35]

H. Pirsiavash and D. Ramanan, Detecting activities of daily living in first-person camera views,, in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE, 2012, 2847–2854. doi: 10.1109/CVPR.2012.6248010.  Google Scholar

[36]

Y. Poleg, C. Arora and S. Peleg, Temporal segmentation of egocentric videos,, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, 2537–2544. doi: 10.1109/CVPR.2014.325.  Google Scholar

[37]

Y. Poleg, A. Ephrat, S. Peleg and C. Arora, Compact CNN for indexing egocentric videos, in Applications of Computer Vision (WACV), 2016 IEEE Winter Conference on, IEEE, 2016, 1–9. Google Scholar

[38]

L. I. RudinS. Osher and E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D: Nonlinear Phenomena, 60 (1992), 259-268.  doi: 10.1016/0167-2789(92)90242-F.  Google Scholar

[39]

M. S. Ryoo and L. Matthies, First-person activity recognition: What are they doing to me?, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013, 2730–2737. Google Scholar

[40]

M. S. Ryoo, B. Rothrock and L. Matthies, Pooled motion features for first-person videos,, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015,896–904. doi: 10.1109/CVPR.2015.7298691.  Google Scholar

[41]

S. SinghC. Arora and C. Jawahar, Trajectory aligned features for first person action recognition, Pattern Recognition, 62 (2017), 45-55.  doi: 10.1016/j.patcog.2016.07.031.  Google Scholar

[42]

E. H. Spriggs, F. De La Torre and M. Hebert, Temporal segmentation and activity classification from first-person sensing,, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, 17–24. doi: 10.1109/CVPRW.2009.5204354.  Google Scholar

[43]

D. Tran, L. Bourdev, R. Fergus, L. Torresani and M. Paluri, Learning spatiotemporal features with 3D convolutional networks, in Computer Vision (ICCV), 2015 IEEE International Conference on, IEEE, 2015, 4489–4497. Google Scholar

[44]

Y. Van Gennip and A. L. Bertozzi et al., $\Gamma$-convergence of graph Ginzburg-Landau functionals, Advances in Differential Equations, 17 (2012), 1115-1180.   Google Scholar

[45]

Y. Van GennipN. GuillenB. Osting and A. L. Bertozzi, Mean curvature, threshold dynamics, and phase field theory on finite graphs, Milan Journal of Mathematics, 82 (2014), 3-65.  doi: 10.1007/s00032-014-0216-8.  Google Scholar

[46]

X. WangL. GaoJ. SongX. ZhenN. Sebe and H. T. Shen, Deep appearance and motion learning for egocentric activity recognition, Neurocomputing, 275 (2018), 438-447.  doi: 10.1016/j.neucom.2017.08.063.  Google Scholar

[47]

L. Zelnik-Manor and P. Perona, Self-tuning spectral clustering, in Advances in Neural Information Processing Systems, 2005, 1601–1608. Google Scholar

[48]

W. ZhuV. ChayesA. TiardS. SanchezD. DahlbergA. L. BertozziS. OsherD. Zosso and D. Kuang, Unsupervised classification in hyperspectral imagery with nonlocal total variation and primal-dual hybrid gradient algorithm, IEEE Transactions on Geoscience and Remote Sensing, 55 (2017), 2786-2798.  doi: 10.1109/TGRS.2017.2654486.  Google Scholar

Figure 1.  A summary of the proposed method. First, we compute a dense optical flow field for each pair of consecutive frames. We then divide each optical flow field into $ s_x\times s_y $ spatial regions, where each region consists of $ dx\times dy $ pixels, and divide the video into $ s_t $ temporal segments, where each segment consists of $ dt $ frames. For each $ dx \times dy \times dt $ cuboid, we count the number of flow vectors with direction lying within in each octant, yielding a $ s_x\times s_y $ histogram for each segment of video. We reshape and concatenate each histogram into a single feature vector of dimension $ s_x \times s_y \times 8 $ describing the motion that occurs within the video segment. The dimension of the feature vectors is reduced with NMF and we smooth them with a moving-window average operator. Finally, we classify the smoothed features with a semi-supervised MBO scheme
Figure 2.  Classification results on a contiguous sample of 4000 segments (approximately 13 minutes) from the LAPD body-worn video data set. The results are obtained by running both methods with the parameters described in section 4.2
Figure 3.  Confusion matrices for the LAPD Body-worn video data set. The background intensity in cell $ (k, \ell) $ corresponds to the number of data points in class $ k $ that are classified as class $ \ell $ by the algorithm
Figure 4.  Confusion matrix for the HUJI EgoSeg data set. The background intensity in cell $ (k, \ell) $ corresponds to the number of data points in class $ k $ that are classified as class $ \ell $ by the algorithm
TABLE 4">Figure 5.  Classification results on a contiguous sample of 4000 segments (approximately 4 hours) from the testing set of HUJI EgoSeg data set. The recall of the same experiment is reported in TABLE 4
Figure 6.  Confusion matrices for the LAPD police Body-worn video data set. The background intensity of cell $ (k, \ell) $ corresponds to the number of data points in class $ k $ that are classified as class $ \ell $ by the algorithm
Table 1.  Experimental Setup
Motion feature NMF Spectrum of the Graph Laplacian MBO
$ \Delta T $ (sec)FPSNumber of segmentsWindow size (segment) $ \hat{k} $ $ N_\mathrm{eig} $ $ \tau_{ij} $ $ N_{sample} $Batch size (segment)ηtNstep
QUAD$ 1/60 $6014,399-50500$ \tau = 1 $1000-3000.110
LAPD$ 1/5 $30274,4435502000$ K = 100 $2000300004000.110
LAPD [30] $ 1/5 $30274,443--2000$ K = 100 $2000300004000.110
HUJI41536,4212050400$ K = 40 $400-3000.110
Motion feature NMF Spectrum of the Graph Laplacian MBO
$ \Delta T $ (sec)FPSNumber of segmentsWindow size (segment) $ \hat{k} $ $ N_\mathrm{eig} $ $ \tau_{ij} $ $ N_{sample} $Batch size (segment)ηtNstep
QUAD$ 1/60 $6014,399-50500$ \tau = 1 $1000-3000.110
LAPD$ 1/5 $30274,4435502000$ K = 100 $2000300004000.110
LAPD [30] $ 1/5 $30274,443--2000$ K = 100 $2000300004000.110
HUJI41536,4212050400$ K = 40 $400-3000.110
Table 2.  Class proportion and precision of the QUAD data set
Precision
Class Proportion [21] [30] Ours
Jump 14.54% - 92.51% 99.07%
Stand 13.74% - 87.90% 87.11%
Walk 12.75% - 84.52% 98.37%
Step 12.65% - 93.98% 98.54%
Turn Left 11.25% - 89.43% 96.96%
Turn Right 10.16% - 92.80% 96.21%
Run 9.00% - 92.38% 96.17%
Look Up 8.85% - 80.36% 90.02%
Look Down 7.06% - 84.59% 89.00%
Mean 11.11% 95% 88.74% 94.49%
Precision
Class Proportion [21] [30] Ours
Jump 14.54% - 92.51% 99.07%
Stand 13.74% - 87.90% 87.11%
Walk 12.75% - 84.52% 98.37%
Step 12.65% - 93.98% 98.54%
Turn Left 11.25% - 89.43% 96.96%
Turn Right 10.16% - 92.80% 96.21%
Run 9.00% - 92.38% 96.17%
Look Up 8.85% - 80.36% 90.02%
Look Down 7.06% - 84.59% 89.00%
Mean 11.11% 95% 88.74% 94.49%
Table 3.  Class proportion, precision, and recall of the selected nine classes in the LAPD body-worn video data set
Precision Recall
Class Proportion [30] Ours [30] Ours
Stand still 62.57% 73.10% 89.44% 85.42% 95.24%
In stationary car 16.84% 41.83% 93.69% 43.18% 89.73%
Walk 9.04% 38.36% 70.53% 19.54% 59.41%
In moving car 5.76% 70.71% 91.03% 25.08% 84.40%
At car window 0.64% 17.23% 71.45% 10.94% 45.28%
At car trunk 0.58% 73.78% 71.79% 11.09% 51.78%
Run 0.33% 96.15% 75.94% 11.03% 53.35%
Bike 0.33% 85.71% 86.49% 14.37% 75.44%
Motorcycle 0.08% 100% 92.49% 10.76% 71.75%
Mean 10.68% 66.32% 82.54% 25.71% 69.60%
Precision Recall
Class Proportion [30] Ours [30] Ours
Stand still 62.57% 73.10% 89.44% 85.42% 95.24%
In stationary car 16.84% 41.83% 93.69% 43.18% 89.73%
Walk 9.04% 38.36% 70.53% 19.54% 59.41%
In moving car 5.76% 70.71% 91.03% 25.08% 84.40%
At car window 0.64% 17.23% 71.45% 10.94% 45.28%
At car trunk 0.58% 73.78% 71.79% 11.09% 51.78%
Run 0.33% 96.15% 75.94% 11.03% 53.35%
Bike 0.33% 85.71% 86.49% 14.37% 75.44%
Motorcycle 0.08% 100% 92.49% 10.76% 71.75%
Mean 10.68% 66.32% 82.54% 25.71% 69.60%
Table 4.  Class proportion and recall of the HUJI EgoSeg data set
Recall
Class Proportion [36] [40] [43] [37] Ours
Walking 34% 83% 91% 79% 89% 91%
Sitting 25% 62% 70% 62% 84% 71%
Standing 21% 47% 44% 62% 79% 47%
Biking 8% 86% 34% 36% 91% 88%
Driving 5% 74% 82% 92% 100% 95%
Static 4% 97% 61% 100% 98% 96%
Riding Bus 4% 43% 37% 58% 82% 84%
Mean 14% 70% 60% 70% 89% 82%
Training $ \sim $60% $ \sim $60% $ \sim $60% $ \sim $60% 6%
Recall
Class Proportion [36] [40] [43] [37] Ours
Walking 34% 83% 91% 79% 89% 91%
Sitting 25% 62% 70% 62% 84% 71%
Standing 21% 47% 44% 62% 79% 47%
Biking 8% 86% 34% 36% 91% 88%
Driving 5% 74% 82% 92% 100% 95%
Static 4% 97% 61% 100% 98% 96%
Riding Bus 4% 43% 37% 58% 82% 84%
Mean 14% 70% 60% 70% 89% 82%
Training $ \sim $60% $ \sim $60% $ \sim $60% $ \sim $60% 6%
Table 5.  Class proportion, precision, recall, and accuracy on the LAPD body-worn video data set
Precision Recall
Class Proportion [30] Ours [30] Ours
Stand still 62.57% 73.10% 89.44% 85.42 % 95.24%
In stationary car 16.84% 41.83% 93.69% 43.18% 89.73%
Walk 9.04% 38.36% 70.53% 19.54% 59.41%
In moving car 5.76% 70.71% 91.03% 25.08% 84.40%
Obscured camera 2.80% 51.65% 80.82% 15.93% 70.46%
At car window 0.64% 17.23% 71.45% 10.94% 45.28%
At car trunk 0.58% 73.78% 71.79% 11.09% 51.78%
Exit driver 0.35% 6.68% 50.25% 11.82% 21.12%
Exit passenger 0.34% 79.69% 48.08% 11.59% 26.29%
Run 0.33% 96.15% 75.94% 11.03% 53.35%
Bike 0.33% 85.71% 86.49% 14.37% 75.44%
Enter passenger 0.20% 5.97% 45.82% 13.27% 24.51%
Enter driver 0.12% 5.72% 34.33% 12.3% 20.91%
Motorcycle 0.08% 100% 92.49% 10.76% 71.75%
Mean 7.14% 53.33% 71.58% 21.17% 56.41%
Accuracy 65.03% 88.15%
Precision Recall
Class Proportion [30] Ours [30] Ours
Stand still 62.57% 73.10% 89.44% 85.42 % 95.24%
In stationary car 16.84% 41.83% 93.69% 43.18% 89.73%
Walk 9.04% 38.36% 70.53% 19.54% 59.41%
In moving car 5.76% 70.71% 91.03% 25.08% 84.40%
Obscured camera 2.80% 51.65% 80.82% 15.93% 70.46%
At car window 0.64% 17.23% 71.45% 10.94% 45.28%
At car trunk 0.58% 73.78% 71.79% 11.09% 51.78%
Exit driver 0.35% 6.68% 50.25% 11.82% 21.12%
Exit passenger 0.34% 79.69% 48.08% 11.59% 26.29%
Run 0.33% 96.15% 75.94% 11.03% 53.35%
Bike 0.33% 85.71% 86.49% 14.37% 75.44%
Enter passenger 0.20% 5.97% 45.82% 13.27% 24.51%
Enter driver 0.12% 5.72% 34.33% 12.3% 20.91%
Motorcycle 0.08% 100% 92.49% 10.76% 71.75%
Mean 7.14% 53.33% 71.58% 21.17% 56.41%
Accuracy 65.03% 88.15%
[1]

John Leventides, Costas Poulios, Georgios Alkis Tsiatsios, Maria Livada, Stavros Tsipras, Konstantinos Lefcaditis, Panagiota Sargenti, Aleka Sargenti. Systems theory and analysis of the implementation of non pharmaceutical policies for the mitigation of the COVID-19 pandemic. Journal of Dynamics & Games, 2021  doi: 10.3934/jdg.2021004

[2]

Indranil Chowdhury, Gyula Csató, Prosenjit Roy, Firoj Sk. Study of fractional Poincaré inequalities on unbounded domains. Discrete & Continuous Dynamical Systems, 2021, 41 (6) : 2993-3020. doi: 10.3934/dcds.2020394

[3]

Tao Wang. Variational relations for metric mean dimension and rate distortion dimension. Discrete & Continuous Dynamical Systems, 2021  doi: 10.3934/dcds.2021050

[4]

Enkhbat Rentsen, N. Tungalag, J. Enkhbayar, O. Battogtokh, L. Enkhtuvshin. Application of survival theory in Mining industry. Numerical Algebra, Control & Optimization, 2021, 11 (3) : 443-448. doi: 10.3934/naco.2020036

[5]

Thomas Barthelmé, Andrey Gogolev. Centralizers of partially hyperbolic diffeomorphisms in dimension 3. Discrete & Continuous Dynamical Systems, 2021  doi: 10.3934/dcds.2021044

[6]

Guillaume Bal, Wenjia Jing. Homogenization and corrector theory for linear transport in random media. Discrete & Continuous Dynamical Systems, 2010, 28 (4) : 1311-1343. doi: 10.3934/dcds.2010.28.1311

[7]

Felix Finster, Jürg Fröhlich, Marco Oppio, Claudio F. Paganini. Causal fermion systems and the ETH approach to quantum theory. Discrete & Continuous Dynamical Systems - S, 2021, 14 (5) : 1717-1746. doi: 10.3934/dcdss.2020451

[8]

Fioralba Cakoni, Shixu Meng, Jingni Xiao. A note on transmission eigenvalues in electromagnetic scattering theory. Inverse Problems & Imaging, , () : -. doi: 10.3934/ipi.2021025

[9]

Daoyuan Fang, Ting Zhang. Compressible Navier-Stokes equations with vacuum state in one dimension. Communications on Pure & Applied Analysis, 2004, 3 (4) : 675-694. doi: 10.3934/cpaa.2004.3.675

[10]

W. Cary Huffman. On the theory of $\mathbb{F}_q$-linear $\mathbb{F}_{q^t}$-codes. Advances in Mathematics of Communications, 2013, 7 (3) : 349-378. doi: 10.3934/amc.2013.7.349

[11]

Qi Lü, Xu Zhang. A concise introduction to control theory for stochastic partial differential equations. Mathematical Control & Related Fields, 2021  doi: 10.3934/mcrf.2021020

[12]

Florian Dorsch, Hermann Schulz-Baldes. Random Möbius dynamics on the unit disc and perturbation theory for Lyapunov exponents. Discrete & Continuous Dynamical Systems - B, 2021  doi: 10.3934/dcdsb.2021076

[13]

Jan Prüss, Laurent Pujo-Menjouet, G.F. Webb, Rico Zacher. Analysis of a model for the dynamics of prions. Discrete & Continuous Dynamical Systems - B, 2006, 6 (1) : 225-235. doi: 10.3934/dcdsb.2006.6.225

[14]

Qiao Liu. Partial regularity and the Minkowski dimension of singular points for suitable weak solutions to the 3D simplified Ericksen–Leslie system. Discrete & Continuous Dynamical Systems, 2021  doi: 10.3934/dcds.2021041

[15]

Qian Cao, Yongli Cai, Yong Luo. Nonconstant positive solutions to the ratio-dependent predator-prey system with prey-taxis in one dimension. Discrete & Continuous Dynamical Systems - B, 2021  doi: 10.3934/dcdsb.2021095

[16]

Norman Noguera, Ademir Pastor. Scattering of radial solutions for quadratic-type Schrödinger systems in dimension five. Discrete & Continuous Dynamical Systems, 2021, 41 (8) : 3817-3836. doi: 10.3934/dcds.2021018

[17]

Bruno Premoselli. Einstein-Lichnerowicz type singular perturbations of critical nonlinear elliptic equations in dimension 3. Discrete & Continuous Dynamical Systems, 2021  doi: 10.3934/dcds.2021069

[18]

Sohana Jahan. Discriminant analysis of regularized multidimensional scaling. Numerical Algebra, Control & Optimization, 2021, 11 (2) : 255-267. doi: 10.3934/naco.2020024

[19]

Beom-Seok Han, Kyeong-Hun Kim, Daehan Park. A weighted Sobolev space theory for the diffusion-wave equations with time-fractional derivatives on $ C^{1} $ domains. Discrete & Continuous Dynamical Systems, 2021, 41 (7) : 3415-3445. doi: 10.3934/dcds.2021002

[20]

Qiang Guo, Dong Liang. An adaptive wavelet method and its analysis for parabolic equations. Numerical Algebra, Control & Optimization, 2013, 3 (2) : 327-345. doi: 10.3934/naco.2013.3.327

2019 Impact Factor: 1.338

Article outline

Figures and Tables

[Back to Top]