Training (overparametrized) neural networks in near-linear time

Jan van den Brand, Binghui Peng, Zhao Song, Omri Weinstein

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Scopus citations

Abstract

The slow convergence rate and pathological curvature issues of first-order gradient methods for training deep neural networks, initiated an ongoing effort for developing faster second-order optimization algorithms beyond SGD, without compromising the generalization error. Despite their remarkable convergence rate (independent of the training batch size n), second-order algorithms incur a daunting slowdown in the cost per iteration (inverting the Hessian matrix of the loss function), which renders them impractical. Very recently, this computational overhead was mitigated by the works of [79, 23], yielding an O(mn2)-time second-order algorithm for training two-layer overparametrized neural networks of polynomial width m. We show how to speed up the algorithm of [23], achieving an Oe(mn)-time backpropagation algorithm for training (mildly overparametrized) ReLU networks, which is near-linear in the dimension (mn) of the full gradient (Jacobian) matrix. The centerpiece of our algorithm is to reformulate the Gauss-Newton iteration as an `2-regression problem, and then use a Fast-JL type dimension reduction to precondition the underlying Gram matrix in time independent of M, allowing to find a sufficiently good approximate solution via first-order conjugate gradient. Our result provides a proof-of-concept that advanced machinery from randomized linear algebra – which led to recent breakthroughs in convex optimization (ERM, LPs, Regression) – can be carried over to the realm of deep learning as well.

Original languageEnglish
Title of host publication12th Innovations in Theoretical Computer Science Conference, ITCS 2021
EditorsJames R. Lee
PublisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
ISBN (Electronic)9783959771771
DOIs
StatePublished - 1 Feb 2021
Externally publishedYes
Event12th Innovations in Theoretical Computer Science Conference, ITCS 2021 - Virtual, Online
Duration: 6 Jan 20218 Jan 2021

Publication series

NameLeibniz International Proceedings in Informatics, LIPIcs
Volume185
ISSN (Print)1868-8969

Conference

Conference12th Innovations in Theoretical Computer Science Conference, ITCS 2021
CityVirtual, Online
Period6/01/218/01/21

Bibliographical note

Publisher Copyright:
© Jan van den Brand, Binghui Peng, Zhao Song, and Omri Weinstein.

Keywords

  • Deep learning theory
  • Nonconvex optimization

Fingerprint

Dive into the research topics of 'Training (overparametrized) neural networks in near-linear time'. Together they form a unique fingerprint.

Cite this