Abstract
Curriculum Learning is motivated by human cognition, where teaching often involves gradually exposing the learner to examples in a meaningful order, from easy to hard. Although methods based on this concept have been empirically shown to improve performance of several machine learning algorithms, no theoretical analysis has been provided even for simple cases. To address this shortfall, we start by formulating an ideal definition of difficulty score - the loss of the optimal hypothesis at a given datapoint. We analyze the possible contribution of curriculum learning based on this score in two convex problems - linear regression, and binary classification by hinge loss minimization. We show that in both cases, the convergence rate of SGD optimization decreases monotonically with the difficulty score, in accordance with earlier empirical results. We also prove that when the difficulty score is fixed, the convergence rate of SGD optimization is monotonically increasing with respect to the loss of the current hypothesis at each point. We discuss how these results settle some confusion in the literature where two apparently opposing heuristics are reported to improve performance: curriculum learning in which easier points are given priority, vs hard data mining where the more difficult points are sought out.
Original language | English |
---|---|
Article number | 222 |
Number of pages | 19 |
Journal | Journal of Machine Learning Research |
Volume | 21 |
State | Published - Nov 2020 |
Bibliographical note
Funding Information:This work was supported in part by a grant from the Israeli Science Foundation (ISF) and by the Gatsby Charitable Foundations.
Publisher Copyright:
© 2020 Weinshall Daphna & Amir Dan. License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v21/18-751.html.
Keywords
- Curriculum learning
- Hinge loss minimization
- Linear regression