SGD learns the Conjugate Kernel class of the network

Amit Daniely*

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

83 Scopus citations

Abstract

We show that the standard stochastic gradient decent (SGD) algorithm is guaranteed to learn, in polynomial time, a function that is competitive with the best function in the conjugate kernel space of the network, as defined in Daniely et al. [2016]. The result holds for log-depth networks from a rich family of architectures. To the best of our knowledge, it is the first polynomial-time guarantee for the standard neural network learning algorithm for networks of depth more that two. As corollaries, it follows that for neural networks of any depth between 2 and log(n), SGD is guaranteed to learn, in polynomial time, constant degree polynomials with polynomially bounded coefficients. Likewise, it follows that SGD on large enough networks can learn any continuous function (not in polynomial time), complementing classical expressivity results.

Original languageEnglish
Pages (from-to)2423-2431
Number of pages9
JournalAdvances in Neural Information Processing Systems
Volume2017-December
StatePublished - 2017
Event31st Annual Conference on Neural Information Processing Systems, NIPS 2017 - Long Beach, United States
Duration: 4 Dec 20179 Dec 2017

Bibliographical note

Publisher Copyright:
© 2017 Neural information processing systems foundation. All rights reserved.

Fingerprint

Dive into the research topics of 'SGD learns the Conjugate Kernel class of the network'. Together they form a unique fingerprint.

Cite this