A bayes-optimal view on adversarial examples

Eitan Richardson, Yair Weiss

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

Since the discovery of adversarial examples – the ability to fool modern CNN classifiers with tiny perturbations of the input, there has been much discussion whether they are a “bug” that is specific to current neural architectures and training methods or an inevitable “feature” of high dimensional geometry. In this paper, we argue for examining adversarial examples from the perspective of Bayes-Optimal classification. We construct realistic image datasets for which the Bayes-Optimal classifier can be efficiently computed and derive analytic conditions on the distributions under which these classifiers are provably robust against any adversarial attack even in high dimensions. Our results show that even when these “gold standard” optimal classifiers are robust, CNNs trained on the same datasets consistently learn a vulnerable classifier, indicating that adversarial examples are often an avoidable “bug”. We further show that RBF SVMs trained on the same data consistently learn a robust classifier. The same trend is observed in experiments with real images in different datasets.

Original languageEnglish
Article number221
Number of pages28
JournalJournal of Machine Learning Research
Volume22
StatePublished - 2021

Bibliographical note

Funding Information:
This work was supported by the Israeli Science Foundation, the Ministry of Science and Technology, the Gatsby Foundation and the Center for Interdisciplinary Data Science Research (CIDR).

Publisher Copyright:
© 2021 Eitan Richardson and Yair Weiss.

Keywords

  • Adversarial examples
  • Bayes optimal
  • CNN
  • Generative models
  • SVM

Fingerprint

Dive into the research topics of 'A bayes-optimal view on adversarial examples'. Together they form a unique fingerprint.

Cite this