Beyond convexity: Stochastic quasi-convex optimization

Elad Hazan, Kfir Y. Levy, Shai Shalev-Shwartz

Research output: Contribution to journalConference articlepeer-review

81 Scopus citations

Abstract

Stochastic convex optimization is a basic and well studied primitive in machine learning. It is well known that convex and Lipschitz functions can be minimized efficiently using Stochastic Gradient Descent (SGD). The Normalized Gradient Descent (NGD) algorithm, is an adaptation of Gradient Descent, which updates according to the direction of the gradients, rather than the gradients themselves. In this paper we analyze a stochastic version of NGD and prove its convergence to a global minimum for a wider class of functions: we require the functions to be quasi-convex and locally-Lipschitz. Quasi-convexity broadens the concept of unimodality to multidimensions and allows for certain types of saddle points, which are a known hurdle for first-order optimization methods such as gradient descent. Locally-Lipschitz functions are only required to be Lipschitz in a small region around the optimum. This assumption circumvents gradient explosion, which is another known hurdle for gradient descent variants. Interestingly, unlike the vanilla SGD algorithm, the stochastic normalized gradient descent algorithm provably requires a minimal minibatch size.

Original languageEnglish
Pages (from-to)1594-1602
Number of pages9
JournalAdvances in Neural Information Processing Systems
Volume2015-January
StatePublished - 2015
Event29th Annual Conference on Neural Information Processing Systems, NIPS 2015 - Montreal, Canada
Duration: 7 Dec 201512 Dec 2015

Bibliographical note

Funding Information:
The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement no 336078-ERC-SUBLRN. Shai S-Shwartz is supported by ISF no 1673/14 and by Intel's ICRI-CI.

Fingerprint

Dive into the research topics of 'Beyond convexity: Stochastic quasi-convex optimization'. Together they form a unique fingerprint.

Cite this