Lower bounds for non-convex stochastic optimization

Yossi Arjevani, Yair Carmon*, John C. Duchi, Dylan J. Foster, Nathan Srebro, Blake Woodworth

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

We lower bound the complexity of finding ϵ-stationary points (with gradient norm at most ϵ) using stochastic first-order methods. In a well-studied model where algorithms access smooth, potentially non-convex functions through queries to an unbiased stochastic gradient oracle with bounded variance, we prove that (in the worst case) any algorithm requires at least ϵ- 4 queries to find an ϵ-stationary point. The lower bound is tight, and establishes that stochastic gradient descent is minimax optimal in this model. In a more restrictive model where the noisy gradient estimates satisfy a mean-squared smoothness property, we prove a lower bound of ϵ- 3 queries, establishing the optimality of recently proposed variance reduction techniques.

Original languageAmerican English
Pages (from-to)165-214
Number of pages50
JournalMathematical Programming
Volume199
Issue number1-2
DOIs
StatePublished - 2023

Bibliographical note

Funding Information:
Part of this work was completed while the authors were visiting the Simons Institute for the Foundations of Deep Learning program. We thank Ayush Sekhari, Ohad Shamir, Aaron Sidford and Karthik Sridharan for several helpful discussions. YC was supported by the Stanford Graduate Fellowship. JCD acknowledges support from NSF CAREER award 1553086, the Sloan Foundation, and ONR-YIP N00014-19-1-2288. DF was supported by NSF TRIPODS award #1740751. BW was supported by the Google PhD Fellowship program. Division of Computing and Communication Foundations (Grant Number 1553086

Publisher Copyright:
© 2022, Springer-Verlag GmbH Germany, part of Springer Nature and Mathematical Optimization Society.

Fingerprint

Dive into the research topics of 'Lower bounds for non-convex stochastic optimization'. Together they form a unique fingerprint.

Cite this