Optimal learners for multiclass problems

Research output: Contribution to journalConference articlepeer-review

40 Scopus citations

Abstract

The fundamental theorem of statistical learning states that for binary classification problems, any Empirical Risk Minimization (ERM) learning rule has close to optimal sample complexity. In this paper we seek for a generic optimal learner for multiclass prediction. We start by proving a surprising result: a generic optimal multiclass learner must be improper, namely, it must have the ability to output hypotheses which do not belong to the hypothesis class, even though it knows that all the labels are generated by some hypothesis from the class. In particular, no ERM learner is optimal. This brings back the fundamental question of "how to learn"? We give a complete answer to this question by giving a new analysis of the one-inclusion multiclass learner of Rubinstein et al. (2006) showing that its sample complexity is essentially optimal. Then, we turn to study the popular hypothesis class of generalized linear classifiers. We derive optimal learners that, unlike the one-inclusion algorithm, are computationally efficient. Furthermore, we show that the sample complexity of these learners is better than the sample complexity of the ERM rule, thus settling in negative an open question due to Collins (2005).

Original languageEnglish
Pages (from-to)287-316
Number of pages30
JournalProceedings of Machine Learning Research
Volume35
StatePublished - 2014
Event27th Conference on Learning Theory, COLT 2014 - Barcelona, Spain
Duration: 13 Jun 201415 Jun 2014

Bibliographical note

Publisher Copyright:
© 2014 A. Daniely & S. Shalev-Shwartz.

Fingerprint

Dive into the research topics of 'Optimal learners for multiclass problems'. Together they form a unique fingerprint.

Cite this