Generalization bounds for neural networks via approximate description length

Amit Daniely, Elad Granot

Research output: Contribution to journalConference articlepeer-review

13 Scopus citations

Abstract

We investigate the sample complexity of networks with bounds on the magnitude of its weights. In particular, we consider the class N = {Wt ? ? ? Wt-1 ? ?... ? ? ? W1: W1,..., Wt-1 ? Md×d, Wt ? M1,d} where the spectral norm of each Wi is bounded by O(1), the Frobenius norm ex is bounded by R, and ? is the sigmoid function 1+ex or the smoothened ReLU function ln (1 + ex). We show that for any depth t, if the inputs are in [-1, 1]d, the sample complexity of N is Õ ( dR e22 ). This bound is optimal up to log-factors, and substantially improves over the previous state of the art of Õ ( d2 e R22 ), that was established in a recent line of work [9, 4, 7, 5, 2, 8]. We furthermore show that this bound remains valid if instead of considering the magnitude of the Wi's, we consider the magnitude of Wi - Wi0, where Wi0 are some reference matrices, with spectral norm of O(1). By taking the Wi0 to be the matrices at the onset of the training process, we get sample complexity bounds that are sub-linear in the number of parameters, in many typical regimes of parameters. To establish our results we develop a new technique to analyze the sample complexity of families H of predictors. We start by defining a new notion of a randomized approximate description of functions f: X ? Rd. We then show that if there is a way to approximately describe functions in a class H using d bits, then e d2 examples suffices to guarantee uniform convergence. Namely, that the empirical loss of all the functions in the class is e-close to the true loss. Finally, we develop a set of tools for calculating the approximate description length of classes of functions that can be presented as a composition of linear function classes and non-linear functions.

Original languageEnglish
JournalAdvances in Neural Information Processing Systems
Volume32
StatePublished - 2019
Event33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019 - Vancouver, Canada
Duration: 8 Dec 201914 Dec 2019

Bibliographical note

Publisher Copyright:
© 2019 Neural information processing systems foundation. All rights reserved.

Fingerprint

Dive into the research topics of 'Generalization bounds for neural networks via approximate description length'. Together they form a unique fingerprint.

Cite this