Abstract
We design an algorithm which finds an ε-approximate stationary point (with k∇F(x)k ≤ ε) using O(ε−3) stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed. We prove a lower bound which establishes that this rate is optimal and—surprisingly—that it cannot be improved using stochastic pth order methods for any p ≥ 2, even when the first p derivatives of the objective are Lipschitz. Together, these results characterize the complexity of non-convex stochastic optimization with second-order methods and beyond. Expanding our scope to the oracle complexity of finding (ε, γ)-approximate second-order stationary points, we establish nearly matching upper and lower bounds for stochastic second-order methods. Our lower bounds here are novel even in the noiseless case.
| Original language | English |
|---|---|
| Pages (from-to) | 242-299 |
| Number of pages | 58 |
| Journal | Proceedings of Machine Learning Research |
| Volume | 125 |
| State | Published - 2020 |
| Externally published | Yes |
| Event | 33rd Conference on Learning Theory, COLT 2020 - Virtual, Online, Austria Duration: 9 Jul 2020 → 12 Jul 2020 |
Bibliographical note
Funding Information:We thank Blake Woodworth and Nati Srebo for helpful discussions. YA acknowledges partial support from the Sloan Foundation and Samsung Research. JCD acknowledges support from the NSF CAREER award CCF-1553086, ONR YIP N00014-19-2288, Sloan Foundation, NSF HDR 1934578 (Stanford Data Science Collaboratory), and the DAWN Consortium. DF acknowledges the support of TRIPODS award 1740751. KS acknowledges support from NSF CAREER Award 1750575 and a Sloan Research Fellowship.
Publisher Copyright:
© 2020 Y. Arjevani, Y. Carmon, J. C. Duchi, D. J. Foster, A. Sekhari & K. Sridharan.
Keywords
- Hessian-vector products
- Stochastic optimization
- non-convex optimization
- second-order methods
- variance reduction
Fingerprint
Dive into the research topics of 'Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver