Abstract
We lower bound the complexity of finding ϵ-stationary points (with gradient norm at most ϵ) using stochastic first-order methods. In a well-studied model where algorithms access smooth, potentially non-convex functions through queries to an unbiased stochastic gradient oracle with bounded variance, we prove that (in the worst case) any algorithm requires at least ϵ- 4 queries to find an ϵ-stationary point. The lower bound is tight, and establishes that stochastic gradient descent is minimax optimal in this model. In a more restrictive model where the noisy gradient estimates satisfy a mean-squared smoothness property, we prove a lower bound of ϵ- 3 queries, establishing the optimality of recently proposed variance reduction techniques.
Original language | American English |
---|---|
Pages (from-to) | 165-214 |
Number of pages | 50 |
Journal | Mathematical Programming |
Volume | 199 |
Issue number | 1-2 |
DOIs | |
State | Published - 2023 |
Bibliographical note
Funding Information:Part of this work was completed while the authors were visiting the Simons Institute for the Foundations of Deep Learning program. We thank Ayush Sekhari, Ohad Shamir, Aaron Sidford and Karthik Sridharan for several helpful discussions. YC was supported by the Stanford Graduate Fellowship. JCD acknowledges support from NSF CAREER award 1553086, the Sloan Foundation, and ONR-YIP N00014-19-1-2288. DF was supported by NSF TRIPODS award #1740751. BW was supported by the Google PhD Fellowship program. Division of Computing and Communication Foundations (Grant Number 1553086
Publisher Copyright:
© 2022, Springer-Verlag GmbH Germany, part of Springer Nature and Mathematical Optimization Society.