Abstract
Investigating active learning, we focus on the relation between the number of labeled examples (budget size), and suitable querying strategies. Our theoretical analysis shows a behavior reminiscent of phase transition: typical examples are best queried when the budget is low, while unrepresentative examples are best queried when the budget is large. Combined evidence shows that a similar phenomenon occurs in common classification models. Accordingly, we propose TypiClust - a deep active learning strategy suited for low budgets. In a comparative empirical investigation of supervised learning, using a variety of architectures and image datasets, TypiClust outperforms all other active learning strategies in the low-budget regime. Using TypiClust in the semi-supervised framework, performance gets an even more significant boost. In particular, state-of-the-art semi-supervised methods trained on CIFAR-10 with 10 labeled examples selected by TypiClust, reach 93.2% accuracy - an improvement of 39.4% over random selection. Code is available at https://github.com/avihu111/TypiClust.
| Original language | English |
|---|---|
| Pages (from-to) | 8175-8195 |
| Number of pages | 21 |
| Journal | Proceedings of Machine Learning Research |
| Volume | 162 |
| State | Published - 2022 |
| Event | 39th International Conference on Machine Learning, ICML 2022 - Baltimore, United States Duration: 17 Jul 2022 → 23 Jul 2022 https://proceedings.mlr.press/v162/ |
Bibliographical note
Publisher Copyright:Copyright © 2022 by the author(s)
Fingerprint
Dive into the research topics of 'Active Learning on a Budget: Opposite Strategies Suit High and Low Budgets'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver