Principal Components Bias in Over-parameterized Linear Models, and its Manifestation in Deep Neural Networks

Guy Hacohen, Daphna Weinshall

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

Recent work suggests that convolutional neural networks of different architectures learn to classify images in the same order. To understand this phenomenon, we revisit the over-parametrized deep linear network model. Our analysis reveals that, when the hidden layers are wide enough, the convergence rate of this model’s parameters is exponentially faster along the directions of the larger principal components of the data, at a rate governed by the corresponding singular values. We term this convergence pattern the Principal Components bias (PC-bias). Empirically, we show how the PC-bias streamlines the order of learning of both linear and non-linear networks, more prominently at earlier stages of learning. We then compare our results to the simplicity bias, showing that both biases can be seen independently, and affect the order of learning in different ways. Finally, we discuss how the PC-bias may explain some benefits of early stopping and its connection to PCA, and why deep networks converge more slowly with random labels.

Original languageEnglish
Article number155
Number of pages46
JournalJournal of Machine Learning Research
Volume23
StatePublished - 1 May 2022

Bibliographical note

Funding Information:
We thank our two reviewers for the elaborated and insightful suggestions, which contributed to this work. This work was supported in part by a grant from the Israeli Ministry of Science and Technology, and by the Gatsby Charitable Foundations.

Publisher Copyright:
©2022 Guy Hacohen and Daphna Weinshall.

Keywords

  • Deep linear networks
  • Learning dynamics
  • Learning order
  • PC-bias
  • Simplicity bias

Fingerprint

Dive into the research topics of 'Principal Components Bias in Over-parameterized Linear Models, and its Manifestation in Deep Neural Networks'. Together they form a unique fingerprint.

Cite this