Access to unlabeled data can speed up prediction time

Ruth Urner*, Shai Ben-David, Shai Shalev-Shwartz

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

36 Scopus citations

Abstract

Semi-supervised learning (SSL) addresses the problem of training a classifier using a small number of labeled examples and many unlabeled examples. Most previous work on SSL focused on how availability of unlabeled data can improve the accuracy of the learned classifiers. In this work we study how unlabeled data can be beneficial for constructing faster classifiers. We propose an SSL algorithmic framework which can utilize unlabeled examples for learning classifiers from a predefined set of fast classifiers. We formally analyze conditions under which our algorithmic paradigm obtains significant improvements by the use of unlabeled data. As a side benefit of our analysis we propose a novel quantitative measure of the so-called cluster assumption. We demonstrate the potential merits of our approach by conducting experiments on the MNIST data set, showing that, when a sufficiently large unlabeled sample is available, a fast classifier can be learned from much fewer labeled examples than without such a sample.

Original languageEnglish
Title of host publicationProceedings of the 28th International Conference on Machine Learning, ICML 2011
Pages641-648
Number of pages8
StatePublished - 2011
Event28th International Conference on Machine Learning, ICML 2011 - Bellevue, WA, United States
Duration: 28 Jun 20112 Jul 2011

Publication series

NameProceedings of the 28th International Conference on Machine Learning, ICML 2011

Conference

Conference28th International Conference on Machine Learning, ICML 2011
Country/TerritoryUnited States
CityBellevue, WA
Period28/06/112/07/11

Fingerprint

Dive into the research topics of 'Access to unlabeled data can speed up prediction time'. Together they form a unique fingerprint.

Cite this