Article Contents
Article Contents

# Why curriculum learning & self-paced learning work in big/noisy data: A theoretical perspective

• Since being recently raised, curriculum learning (CL) and self-paced learning (SPL) have attracted increasing attention due to its multiple successful applications. While currently the rationality of this learning regime is heuristically inspired by the cognitive principle of humans, there still isn't a sound theory to explain the intrinsic mechanism leading to its effectiveness, especially on some successful attempts on big/noise data. To address this issue, this paper presents some theoretical results for revealing the insights under this learning scheme. Specifically, we first formulate a new learning problem aiming to learn a proper classifier from samples generated from the training distribution which is deviated from the target distribution. Furthermore, we find that the CL/SPL regime provides a feasible solving strategy for this learning problem. Especially, by first introducing high-confidence/easy samples and gradually involving low-confidence/complex ones into learning, the CL/SPL process latently minimizes an upper bound of the expected risk under target distribution, purely using the data from the deviated training distribution. We further construct a new SPL learning algorithm based on random sampling, which better complies with our theory, and substantiate its effectiveness by experiments implemented on synthetic and real data.
Mathematics Subject Classification: Primary: 68Q32, 68T05; Secondary: 68T15.

 Citation:

•  [1] S. Basu and J. Christensen, Teaching Classification Boundaries to Humans, Proceddings of the 27th AAAI Conference on Artificial Intelligence, 2013. [2] Y. Bengio, J. Louradour, R. Collobert and J. Westone, Curriculum Learning, Proceedings of the 26th International Conference on Machine Learning, (2009), 41-48.doi: 10.1145/1553374.1553380. [3] C.-C. Chang and C.-J. Lin, LIBSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology, 2 (2011), 1-27. Software available from: http://www.csie.ntu.edu.tw/~cjlin/libsvm. [4] X. Chen, A. Shrivastava and A. Gupta, NEIL: Extracting visual knowledge from web data, Proceedings of the IEEE International Conference on Computer Vision, (2013), 1409-1416.doi: 10.1109/ICCV.2013.178. [5] F. Cucker and S. Smale, On the mathematical foundations of learning, Bull. Amer. Math. Soc., 39 (2002), 1-49.doi: 10.1090/S0273-0979-01-00923-5. [6] F. Cucker and D. X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge University Press, New York, NY, USA, 2007.doi: 10.1017/CBO9780511618796. [7] Y. Freund and R. E. Schapire, Experiments with a new boosting algorithm, Proceedings of the 13th International Conference on Machine Learning, 1996. [8] L. Jiang, D. Y. Meng, T. Mitamura and A. Hauptman, Easy samples first: Self-paced reranking for multimedia search, Proceddings of the ACM International Conference on Multimedia, (2014), 547-556.doi: 10.1145/2647868.2654918. [9] L. Jiang, D. Y. Meng, S. Yu, Z. Z. Lan, S. G. Shan and A. Hauptma, Self-paced Learning with Diversity, Advances in Nerual Information Processing Systems 27, 2014. [10] L. Jiang and D. Y. Meng, Q. Zhao, S. G. Shan and A. Hauptman, Self-paced Curriculum Learning, Proceddings of the 29th AAAI Conference on Artificial Intelligence, 2015. [11] F. Khan, X. Zhu and B. Mutlu, How do Humans Teach: On Curriculum Learning and Teaching Dimension, Advances in Nerual Information Processing Systems 24, 2011. [12] M. Kumar, B. Packer and D. Koller, Self-paced Learning for Latent Variable Models, Advances in Nerual Information Processing Systems 23, 2010. [13] M. Kumar, H. Turki, D. Preston and D. Koller, Learning specfic-class segmentation from diverse data, Proceedings of the IEEE International Conference on Computer Vision, 2011. [14] Y. Lee and K. Grauman, Learning the easy things first: Self-paced visual category discovery, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2011), 1721-1728.doi: 10.1109/CVPR.2011.5995523. [15] T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohamed, N. Nakashole, E. Platanios, A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya, A. Gupta, X. Chen, A. Saparov, M. Greaves and J. Welling, Never-Ending Learning, Proceddings of the 29th AAAI Conference on Artificial Intelligence, 2015. [16] M. Mohri, A. Rostamizadeh and A. Talwalkar, Foundations of Machine Learning, The MIT Press, Cambridge, Massachusetts, London, England, 2012. [17] E. Ni and C Ling, Supervised learning with minimal effort, Advances in Knowledge Discovery and Data Mining, 6119 (2010), 476-487.doi: 10.1007/978-3-642-13672-6_45. [18] J. Supanvcivc and D. Ramana, Self-paced learning for long-term tracking, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013. [19] Y. Tang, Y. B. Yang and Y. Gao, Self-paced Dictionary Learning for Image Classification, Proceddings of the ACM International Conference on Multimedia, (2012), 833-836.doi: 10.1145/2393347.2396324. [20] K. Tang, V. Ramanathan, F. Li and D. Koller, Shifting weights: Adapting object detectors from image to video, Advances in Nerual Information Processing Systems 25, 2012. [21] V. Vapnik, Statistical Learning Theory, Wiley-Interscience, New York, 1998. [22] S. Yu, L. Jiang, Z. Mao, X. J. Chang, X. Z. Du, C. Gan, Z. Z. Lan, Z. W. Xu, X. C. Li, Y. Cai, A. Kumar, Y. Miao, L. Martin, N. Wolfe, S. C. Xu, H. Li, M. Lin, Z. G. Ma, Y. Yang, D. Y. Meng, S. G. Shan, P. D. Sahin, S. Burger, F. Metze, R. Singh, B. Raj, T. Mitamura, R. Stern and A. Hauptmann, CMU-Informedia@ TRECVID 2014 Multimedia Event Detection (MED), TRECVID Video Retrieval Evaluation Workshop, 2014. [23] Q. Zhao, D. Y. Meng, L. Jiang, Q. Xie, Z. B. Xu and A. Hauptman, Self-paced Matrix Factorization, Proceddings of the 29th AAAI Conference on Artificial Intelligence, 2015.