The amount of training-data is one of the key factors which determines the generalization capacity of learning algorithms. Intuitively, one expects the error rate to decrease as the amount of training-data increases. Perhaps surprisingly, natural attempts to formalize this intuition give rise to interesting and challenging mathematical questions. For example, in their classical book on pattern recognition, Devroye, Györfi, and Lugosi (1996) ask whether there exists a monotone Bayes-consistent algorithm. This question remained open for over 25 years, until recently Pestov (2021) resolved it for binary classification, using an intricate construction of a monotone Bayes-consistent algorithm. We derive a general result in multiclass classification, showing that every learning algorithm A can be transformed to a monotone one with similar performance. Further, the transformation is efficient and only uses a black-box oracle access to A. This demonstrates that one can provably avoid non-monotonic behaviour without compromising performance, thus answering questions asked by Devroye, Györfi, and Lugosi (1996), Viering, Mey, and Loog (2019), Viering and Loog (2021), and by Mhammedi (2021). Our general transformation readily implies monotone learners in a variety of contexts: for example, Pestov’s result follows by applying it on any Bayes-consistent algorithm (e.g., k-NearestNeighbours). In fact, our transformation extends Pestov’s result to classification tasks with an arbitrary number of labels. This is in contrast with Pestov’s work which is tailored to binary classification. In addition, we provide uniform bounds on the error of the monotone algorithm. This makes our transformation applicable in distribution-free settings. For example, in PAC learning it implies that every learnable class admits a monotone PAC learner. This resolves questions asked by Viering, Mey, and Loog (2019); Viering and Loog (2021); Mhammedi (2021).
Bibliographical noteFunding Information:
US is supported by the Israel Science Foundation (grant 1871/19) and by Len Blavatnik and the Blavatnik Family foundation.
AD received funding from the European Research Council (ERC) under the European Union’s Horizon 2022 research and innovation program (grant agreement No. 101041711), and the Israel Science Foundation (grant number 2258/19), and
HK is supported by the Israel Science Foundation grant no. 1595-19, and the Blavatnik Family Foundation.
SM is a Robert J. Shillman Fellow, his research is supported in part by the Israel Science Foundation (grant No. 1225/20), by a grant from the United States - Israel Binational Science Foundation (BSF), by an Azrieli Faculty Fellowship, by Israel PBC-VATAT, and by the Technion Center for Machine Learning and Intelligent Systems (MLIS).
YM received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement No. 882396), the Israel Science Foundation (grant number 993/17), Tel Aviv University Center for AI and Data Science (TAD), and the Yandex Initiative for Machine Learning at Tel Aviv University.
© 2022 O. Bousquet, A. Daniely, H. Kaplan, Y. Mansour, S. Moran & U. Stemmer.
- Bayes consistency
- Learning curve
- PAC learning