## Abstract

We investigate the sample complexity of networks with bounds on the magnitude of its weights. In particular, we consider the class N = {W_{t} ? ? ? W_{t-}_{1} ? ?... ? ? ? W_{1}: W_{1},..., W_{t-}_{1} ? M_{d}×_{d}, W_{t} ? M_{1,d}} where the spectral norm of each W_{i} is bounded by O(1), the Frobenius norm e^{x} is bounded by R, and ? is the sigmoid function _{1+ex} or the smoothened ReLU function ln (1 + e^{x}). We show that for any depth t, if the inputs are in [-1, 1]^{d}, the sample complexity of N is Õ ( ^{dR} e_{2}^{2} ). This bound is optimal up to log-factors, and substantially improves over the previous state of the art of Õ ( ^{d}^{2 e R}_{2}^{2} ), that was established in a recent line of work [9, 4, 7, 5, 2, 8]. We furthermore show that this bound remains valid if instead of considering the magnitude of the W_{i}'s, we consider the magnitude of W_{i} - W_{i}^{0}, where W_{i}^{0} are some reference matrices, with spectral norm of O(1). By taking the W_{i}^{0} to be the matrices at the onset of the training process, we get sample complexity bounds that are sub-linear in the number of parameters, in many typical regimes of parameters. To establish our results we develop a new technique to analyze the sample complexity of families H of predictors. We start by defining a new notion of a randomized approximate description of functions f: X ? R^{d}. We then show that if there is a way to approximately describe functions in a class H using d bits, then e ^{d}_{2} examples suffices to guarantee uniform convergence. Namely, that the empirical loss of all the functions in the class is e-close to the true loss. Finally, we develop a set of tools for calculating the approximate description length of classes of functions that can be presented as a composition of linear function classes and non-linear functions.

Original language | American English |
---|---|

Journal | Advances in Neural Information Processing Systems |

Volume | 32 |

State | Published - 2019 |

Event | 33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019 - Vancouver, Canada Duration: 8 Dec 2019 → 14 Dec 2019 |

### Bibliographical note

Publisher Copyright:© 2019 Neural information processing systems foundation. All rights reserved.