The complexity of learning halfspaces using generalized linear methods

Research output: Contribution to journalConference articlepeer-review

5 Scopus citations

Abstract

Many popular learning algorithms (E.g. Kernel SVM, logistic regression, Lasso, and Fourier-Transform based algorithms) operate by reducing the problem to a convex optimization problem over a set of functions. These methods offer the currently best approach to several central problems such as learning halfspaces and learning DNF's. In addition they are widely used in numerous application domains. Despite their importance, there are still very few proof techniques to show limits on the power of these algorithms. We study the performance of this approach in the problem of (agnostically and improperly) learning halfspaces with margin γ. Let D be a distribution over labeled examples. The γ-margin error of a hyperplane h is the probability of an example to fall on the wrong side of h or at a distance ≤ γ from it. The γ-margin error of the best h is denoted Errγ(D). An α(γ)-approximation algorithm receives γ, ε as input and, using i.i.d. samples of D, outputs a classifier with error rate ≤ α(γ) Errγ(D) + ε. Such an algorithm is efficient if it uses poly, samples and runs in time polynomial in the sample size. The best approximation ratio achievable by an efficient algorithm is O (Equation presented) and is achieved using an algorithm from the above class. Our main result shows that the approximation ratio of every efficient algorithm from this family must be ≥ Ω (Equation presented), essentially matching the best known upper bound.

Original languageEnglish
Pages (from-to)244-286
Number of pages43
JournalProceedings of Machine Learning Research
Volume35
StatePublished - 2014
Event27th Conference on Learning Theory, COLT 2014 - Barcelona, Spain
Duration: 13 Jun 201415 Jun 2014

Bibliographical note

Publisher Copyright:
© 2014 A. Daniely & S. Shalev-Shwartz.

Fingerprint

Dive into the research topics of 'The complexity of learning halfspaces using generalized linear methods'. Together they form a unique fingerprint.

Cite this