Why do deep convolutional networks generalize so poorly to small image transformations?

Aharon Azulay, Yair Weiss

Research output: Contribution to journalArticlepeer-review

170 Scopus citations

Abstract

Convolutional Neural Networks (CNNs) are commonly assumed to be invariant to small image transformations: either because of the convolutional architecture or because they were trained using data augmentation. Recently, several authors have shown that this is not the case: small translations or rescalings of the input image can drastically change the network’s prediction. In this paper, we quantify this phenomena and ask why neither the convolutional architecture nor data augmentation are sufficient to achieve the desired invariance. Specifically, we show that the convolutional architecture does not give invariance since architectures ignore the classical sampling theorem, and data augmentation does not give invariance because the CNNs learn to be invariant to transformations only for images that are very similar to typical images from the training set. We discuss two possible solutions to this problem: (1) antialiasing the intermediate representations and (2) increasing data augmentation and show that they provide only a partial solution at best. Taken together, our results indicate that the problem of insuring invariance to small image transformations in neural networks while preserving high accuracy remains unsolved.

Original languageAmerican English
Article number184
Number of pages25
JournalJournal of Machine Learning Research
Volume20
StatePublished - 1 Nov 2019

Bibliographical note

Funding Information:
We thank Tal Arkushin for the helpful comments. Support by the ISF and the Gatsby Foundation is gratefully acknowledged.

Publisher Copyright:
© 2019 Aharon Azulay, Yair Weiss.

Keywords

  • Deep Convolutional Neural Networks
  • Generalization
  • Machine Learning

Fingerprint

Dive into the research topics of 'Why do deep convolutional networks generalize so poorly to small image transformations?'. Together they form a unique fingerprint.

Cite this