January  2016, 1(1): 111-127. doi: 10.3934/bdia.2016.1.111

Why curriculum learning & self-paced learning work in big/noisy data: A theoretical perspective

1. 

Institute for Information and System Sciences and Ministry of, Education Key Lab of Intelligent Networks and Network Security, Xi'an Jiaotong University, Xi'an, Shaanxi, China, China, China, China

Received  May 2015 Revised  August 2015 Published  September 2015

Since being recently raised, curriculum learning (CL) and self-paced learning (SPL) have attracted increasing attention due to its multiple successful applications. While currently the rationality of this learning regime is heuristically inspired by the cognitive principle of humans, there still isn't a sound theory to explain the intrinsic mechanism leading to its effectiveness, especially on some successful attempts on big/noise data. To address this issue, this paper presents some theoretical results for revealing the insights under this learning scheme. Specifically, we first formulate a new learning problem aiming to learn a proper classifier from samples generated from the training distribution which is deviated from the target distribution. Furthermore, we find that the CL/SPL regime provides a feasible solving strategy for this learning problem. Especially, by first introducing high-confidence/easy samples and gradually involving low-confidence/complex ones into learning, the CL/SPL process latently minimizes an upper bound of the expected risk under target distribution, purely using the data from the deviated training distribution. We further construct a new SPL learning algorithm based on random sampling, which better complies with our theory, and substantiate its effectiveness by experiments implemented on synthetic and real data.
Citation: Tieliang Gong, Qian Zhao, Deyu Meng, Zongben Xu. Why curriculum learning & self-paced learning work in big/noisy data: A theoretical perspective. Big Data & Information Analytics, 2016, 1 (1) : 111-127. doi: 10.3934/bdia.2016.1.111
References:
[1]

S. Basu and J. Christensen, Teaching Classification Boundaries to Humans, Proceddings of the 27th AAAI Conference on Artificial Intelligence, 2013.

[2]

Y. Bengio, J. Louradour, R. Collobert and J. Westone, Curriculum Learning, Proceedings of the 26th International Conference on Machine Learning, (2009), 41-48. doi: 10.1145/1553374.1553380.

[3]

C.-C. Chang and C.-J. Lin, LIBSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology, 2 (2011), 1-27. Software available from: http://www.csie.ntu.edu.tw/~cjlin/libsvm.

[4]

X. Chen, A. Shrivastava and A. Gupta, NEIL: Extracting visual knowledge from web data, Proceedings of the IEEE International Conference on Computer Vision, (2013), 1409-1416. doi: 10.1109/ICCV.2013.178.

[5]

F. Cucker and S. Smale, On the mathematical foundations of learning, Bull. Amer. Math. Soc., 39 (2002), 1-49. doi: 10.1090/S0273-0979-01-00923-5.

[6]

F. Cucker and D. X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge University Press, New York, NY, USA, 2007. doi: 10.1017/CBO9780511618796.

[7]

Y. Freund and R. E. Schapire, Experiments with a new boosting algorithm, Proceedings of the 13th International Conference on Machine Learning, 1996.

[8]

L. Jiang, D. Y. Meng, T. Mitamura and A. Hauptman, Easy samples first: Self-paced reranking for multimedia search, Proceddings of the ACM International Conference on Multimedia, (2014), 547-556. doi: 10.1145/2647868.2654918.

[9]

L. Jiang, D. Y. Meng, S. Yu, Z. Z. Lan, S. G. Shan and A. Hauptma, Self-paced Learning with Diversity, Advances in Nerual Information Processing Systems 27, 2014.

[10]

L. Jiang and D. Y. Meng, Q. Zhao, S. G. Shan and A. Hauptman, Self-paced Curriculum Learning, Proceddings of the 29th AAAI Conference on Artificial Intelligence, 2015.

[11]

F. Khan, X. Zhu and B. Mutlu, How do Humans Teach: On Curriculum Learning and Teaching Dimension, Advances in Nerual Information Processing Systems 24, 2011.

[12]

M. Kumar, B. Packer and D. Koller, Self-paced Learning for Latent Variable Models, Advances in Nerual Information Processing Systems 23, 2010.

[13]

M. Kumar, H. Turki, D. Preston and D. Koller, Learning specfic-class segmentation from diverse data, Proceedings of the IEEE International Conference on Computer Vision, 2011.

[14]

Y. Lee and K. Grauman, Learning the easy things first: Self-paced visual category discovery, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2011), 1721-1728. doi: 10.1109/CVPR.2011.5995523.

[15]

T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohamed, N. Nakashole, E. Platanios, A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya, A. Gupta, X. Chen, A. Saparov, M. Greaves and J. Welling, Never-Ending Learning, Proceddings of the 29th AAAI Conference on Artificial Intelligence, 2015.

[16]

M. Mohri, A. Rostamizadeh and A. Talwalkar, Foundations of Machine Learning, The MIT Press, Cambridge, Massachusetts, London, England, 2012.

[17]

E. Ni and C Ling, Supervised learning with minimal effort, Advances in Knowledge Discovery and Data Mining, 6119 (2010), 476-487. doi: 10.1007/978-3-642-13672-6_45.

[18]

J. Supanvcivc and D. Ramana, Self-paced learning for long-term tracking, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013.

[19]

Y. Tang, Y. B. Yang and Y. Gao, Self-paced Dictionary Learning for Image Classification, Proceddings of the ACM International Conference on Multimedia, (2012), 833-836. doi: 10.1145/2393347.2396324.

[20]

K. Tang, V. Ramanathan, F. Li and D. Koller, Shifting weights: Adapting object detectors from image to video, Advances in Nerual Information Processing Systems 25, 2012.

[21]

V. Vapnik, Statistical Learning Theory, Wiley-Interscience, New York, 1998.

[22]

S. Yu, L. Jiang, Z. Mao, X. J. Chang, X. Z. Du, C. Gan, Z. Z. Lan, Z. W. Xu, X. C. Li, Y. Cai, A. Kumar, Y. Miao, L. Martin, N. Wolfe, S. C. Xu, H. Li, M. Lin, Z. G. Ma, Y. Yang, D. Y. Meng, S. G. Shan, P. D. Sahin, S. Burger, F. Metze, R. Singh, B. Raj, T. Mitamura, R. Stern and A. Hauptmann, CMU-Informedia@ TRECVID 2014 Multimedia Event Detection (MED), TRECVID Video Retrieval Evaluation Workshop, 2014.

[23]

Q. Zhao, D. Y. Meng, L. Jiang, Q. Xie, Z. B. Xu and A. Hauptman, Self-paced Matrix Factorization, Proceddings of the 29th AAAI Conference on Artificial Intelligence, 2015.

show all references

References:
[1]

S. Basu and J. Christensen, Teaching Classification Boundaries to Humans, Proceddings of the 27th AAAI Conference on Artificial Intelligence, 2013.

[2]

Y. Bengio, J. Louradour, R. Collobert and J. Westone, Curriculum Learning, Proceedings of the 26th International Conference on Machine Learning, (2009), 41-48. doi: 10.1145/1553374.1553380.

[3]

C.-C. Chang and C.-J. Lin, LIBSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology, 2 (2011), 1-27. Software available from: http://www.csie.ntu.edu.tw/~cjlin/libsvm.

[4]

X. Chen, A. Shrivastava and A. Gupta, NEIL: Extracting visual knowledge from web data, Proceedings of the IEEE International Conference on Computer Vision, (2013), 1409-1416. doi: 10.1109/ICCV.2013.178.

[5]

F. Cucker and S. Smale, On the mathematical foundations of learning, Bull. Amer. Math. Soc., 39 (2002), 1-49. doi: 10.1090/S0273-0979-01-00923-5.

[6]

F. Cucker and D. X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge University Press, New York, NY, USA, 2007. doi: 10.1017/CBO9780511618796.

[7]

Y. Freund and R. E. Schapire, Experiments with a new boosting algorithm, Proceedings of the 13th International Conference on Machine Learning, 1996.

[8]

L. Jiang, D. Y. Meng, T. Mitamura and A. Hauptman, Easy samples first: Self-paced reranking for multimedia search, Proceddings of the ACM International Conference on Multimedia, (2014), 547-556. doi: 10.1145/2647868.2654918.

[9]

L. Jiang, D. Y. Meng, S. Yu, Z. Z. Lan, S. G. Shan and A. Hauptma, Self-paced Learning with Diversity, Advances in Nerual Information Processing Systems 27, 2014.

[10]

L. Jiang and D. Y. Meng, Q. Zhao, S. G. Shan and A. Hauptman, Self-paced Curriculum Learning, Proceddings of the 29th AAAI Conference on Artificial Intelligence, 2015.

[11]

F. Khan, X. Zhu and B. Mutlu, How do Humans Teach: On Curriculum Learning and Teaching Dimension, Advances in Nerual Information Processing Systems 24, 2011.

[12]

M. Kumar, B. Packer and D. Koller, Self-paced Learning for Latent Variable Models, Advances in Nerual Information Processing Systems 23, 2010.

[13]

M. Kumar, H. Turki, D. Preston and D. Koller, Learning specfic-class segmentation from diverse data, Proceedings of the IEEE International Conference on Computer Vision, 2011.

[14]

Y. Lee and K. Grauman, Learning the easy things first: Self-paced visual category discovery, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2011), 1721-1728. doi: 10.1109/CVPR.2011.5995523.

[15]

T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohamed, N. Nakashole, E. Platanios, A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya, A. Gupta, X. Chen, A. Saparov, M. Greaves and J. Welling, Never-Ending Learning, Proceddings of the 29th AAAI Conference on Artificial Intelligence, 2015.

[16]

M. Mohri, A. Rostamizadeh and A. Talwalkar, Foundations of Machine Learning, The MIT Press, Cambridge, Massachusetts, London, England, 2012.

[17]

E. Ni and C Ling, Supervised learning with minimal effort, Advances in Knowledge Discovery and Data Mining, 6119 (2010), 476-487. doi: 10.1007/978-3-642-13672-6_45.

[18]

J. Supanvcivc and D. Ramana, Self-paced learning for long-term tracking, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013.

[19]

Y. Tang, Y. B. Yang and Y. Gao, Self-paced Dictionary Learning for Image Classification, Proceddings of the ACM International Conference on Multimedia, (2012), 833-836. doi: 10.1145/2393347.2396324.

[20]

K. Tang, V. Ramanathan, F. Li and D. Koller, Shifting weights: Adapting object detectors from image to video, Advances in Nerual Information Processing Systems 25, 2012.

[21]

V. Vapnik, Statistical Learning Theory, Wiley-Interscience, New York, 1998.

[22]

S. Yu, L. Jiang, Z. Mao, X. J. Chang, X. Z. Du, C. Gan, Z. Z. Lan, Z. W. Xu, X. C. Li, Y. Cai, A. Kumar, Y. Miao, L. Martin, N. Wolfe, S. C. Xu, H. Li, M. Lin, Z. G. Ma, Y. Yang, D. Y. Meng, S. G. Shan, P. D. Sahin, S. Burger, F. Metze, R. Singh, B. Raj, T. Mitamura, R. Stern and A. Hauptmann, CMU-Informedia@ TRECVID 2014 Multimedia Event Detection (MED), TRECVID Video Retrieval Evaluation Workshop, 2014.

[23]

Q. Zhao, D. Y. Meng, L. Jiang, Q. Xie, Z. B. Xu and A. Hauptman, Self-paced Matrix Factorization, Proceddings of the 29th AAAI Conference on Artificial Intelligence, 2015.

[1]

D. Warren, K Najarian. Learning theory applied to Sigmoid network classification of protein biological function using primary protein structure. Conference Publications, 2003, 2003 (Special) : 898-904. doi: 10.3934/proc.2003.2003.898

[2]

G. Calafiore, M.C. Campi. A learning theory approach to the construction of predictor models. Conference Publications, 2003, 2003 (Special) : 156-166. doi: 10.3934/proc.2003.2003.156

[3]

Miria Feng, Wenying Feng. Evaluation of parallel and sequential deep learning models for music subgenre classification. Mathematical Foundations of Computing, 2021, 4 (2) : 131-143. doi: 10.3934/mfc.2021008

[4]

Alan Beggs. Learning in monotone bayesian games. Journal of Dynamics and Games, 2015, 2 (2) : 117-140. doi: 10.3934/jdg.2015.2.117

[5]

Yangyang Xu, Wotao Yin, Stanley Osher. Learning circulant sensing kernels. Inverse Problems and Imaging, 2014, 8 (3) : 901-923. doi: 10.3934/ipi.2014.8.901

[6]

Christian Soize, Roger Ghanem. Probabilistic learning on manifolds. Foundations of Data Science, 2020, 2 (3) : 279-307. doi: 10.3934/fods.2020013

[7]

Mauro Maggioni, James M. Murphy. Learning by active nonlinear diffusion. Foundations of Data Science, 2019, 1 (3) : 271-291. doi: 10.3934/fods.2019012

[8]

Nicolás M. Crisosto, Christopher M. Kribs-Zaleta, Carlos Castillo-Chávez, Stephen Wirkus. Community resilience in collaborative learning. Discrete and Continuous Dynamical Systems - B, 2010, 14 (1) : 17-40. doi: 10.3934/dcdsb.2010.14.17

[9]

Gernot Holler, Karl Kunisch. Learning nonlocal regularization operators. Mathematical Control and Related Fields, 2022, 12 (1) : 81-114. doi: 10.3934/mcrf.2021003

[10]

Sriram Nagaraj. Optimization and learning with nonlocal calculus. Foundations of Data Science, 2022  doi: 10.3934/fods.2022009

[11]

Minlong Lin, Ke Tang. Selective further learning of hybrid ensemble for class imbalanced increment learning. Big Data & Information Analytics, 2017, 2 (1) : 1-21. doi: 10.3934/bdia.2017005

[12]

Ziju Shen, Yufei Wang, Dufan Wu, Xu Yang, Bin Dong. Learning to scan: A deep reinforcement learning approach for personalized scanning in CT imaging. Inverse Problems and Imaging, 2022, 16 (1) : 179-195. doi: 10.3934/ipi.2021045

[13]

Ning Zhang, Qiang Wu. Online learning for supervised dimension reduction. Mathematical Foundations of Computing, 2019, 2 (2) : 95-106. doi: 10.3934/mfc.2019008

[14]

Changming Song, Yun Wang. Nonlocal latent low rank sparse representation for single image super resolution via self-similarity learning. Inverse Problems and Imaging, 2021, 15 (6) : 1347-1362. doi: 10.3934/ipi.2021017

[15]

Shuhua Wang, Zhenlong Chen, Baohuai Sheng. Convergence of online pairwise regression learning with quadratic loss. Communications on Pure and Applied Analysis, 2020, 19 (8) : 4023-4054. doi: 10.3934/cpaa.2020178

[16]

Prashant Shekhar, Abani Patra. Hierarchical approximations for data reduction and learning at multiple scales. Foundations of Data Science, 2020, 2 (2) : 123-154. doi: 10.3934/fods.2020008

[17]

Mikhail Langovoy, Akhilesh Gotmare, Martin Jaggi. Unsupervised robust nonparametric learning of hidden community properties. Mathematical Foundations of Computing, 2019, 2 (2) : 127-147. doi: 10.3934/mfc.2019010

[18]

Émilie Chouzenoux, Henri Gérard, Jean-Christophe Pesquet. General risk measures for robust machine learning. Foundations of Data Science, 2019, 1 (3) : 249-269. doi: 10.3934/fods.2019011

[19]

Yang Wang, Zhengfang Zhou. Source extraction in audio via background learning. Inverse Problems and Imaging, 2013, 7 (1) : 283-290. doi: 10.3934/ipi.2013.7.283

[20]

Wei Xue, Wensheng Zhang, Gaohang Yu. Least absolute deviations learning of multiple tasks. Journal of Industrial and Management Optimization, 2018, 14 (2) : 719-729. doi: 10.3934/jimo.2017071

 Impact Factor: 

Metrics

  • PDF downloads (330)
  • HTML views (0)
  • Cited by (0)

Other articles
by authors

[Back to Top]