Abstract
Large neural networks trained in the overparameterized regime are able to fit noise to zero train error. Recent work of Nakkiran and Bansal [20] has empirically observed that such networks behave as “conditional samplers” from the noisy distribution. That is, they replicate the noise in the train data to unseen examples. We give a theoretical framework for studying this conditional sampling behavior in the context of learning theory. We relate the notion of such samplers to knowledge distillation, where a student network imitates the outputs of a teacher on unlabeled data. We show that samplers, while being bad classifiers, can be good teachers. Concretely, we prove that distillation from samplers is guaranteed to produce a student which approximates the Bayes optimal classifier. Finally, we show that some common learning algorithms (e.g., Nearest-Neighbours and Kernel Machines) can often generate samplers when applied in the overparameterized regime.
| Original language | English |
|---|---|
| Title of host publication | Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022 |
| Editors | S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh |
| Publisher | Neural information processing systems foundation |
| ISBN (Electronic) | 9781713871088 |
| State | Published - 2022 |
| Event | 36th Conference on Neural Information Processing Systems, NeurIPS 2022 - New Orleans, United States Duration: 28 Nov 2022 → 9 Dec 2022 |
Publication series
| Name | Advances in Neural Information Processing Systems |
|---|---|
| Volume | 35 |
| ISSN (Print) | 1049-5258 |
Conference
| Conference | 36th Conference on Neural Information Processing Systems, NeurIPS 2022 |
|---|---|
| Country/Territory | United States |
| City | New Orleans |
| Period | 28/11/22 → 9/12/22 |
Bibliographical note
Publisher Copyright:© 2022 Neural information processing systems foundation. All rights reserved.
Fingerprint
Dive into the research topics of 'Knowledge Distillation: Bad Models Can Be Good Role Models'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver