Efficient algorithms for linear polyhedral bandits

Manjesh K. Hanawal, Amir Leshem, Venkatesh Saligrama

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

We study stochastic linear optimization problem with bandit feedback. The set of arms take values in an N-dimensional space and belongs to a bounded polyhedron described by finitely many linear inequalities. We present an algorithm that has O(Nlog1+ϵ(T)) expected regret for any ϵ > 0 in T rounds. The algorithm alternates between exploration and exploitation phases where it plays a deterministic set of arms in the exploration phases and a greedily selected arm in the exploitation phases. The regret bound of SEE compares well to the lower bounds of Ω(N log T) that can be derived by a direct adaptation of Lai-Robbin's lower bound proof [1]. Our key insight is that for a polyhedron the optimal arm is robust to small perturbations in the reward function. Consequently, a greedily selected arm is guaranteed to be optimal when the estimation error falls below a suitable threshold. Our solution resolves a question posed by [2] that left open the possibility of efficient algorithms with logarithmic regret bounds. The simplicity of our approach allows us to derive probability one bounds on the regret, in contrast to the weak convergence results of other papers. This ensures that with probability one only finitely many errors occur in the exploitation phase. Numerical investigations show that while theoretical results are asymptotic the performance of our algorithms compares favorably to state-of-the-art algorithms in finite time as well.

Original languageEnglish
Title of host publication2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages4796-4800
Number of pages5
ISBN (Electronic)9781479999880
DOIs
StatePublished - 18 May 2016
Externally publishedYes
Event41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Shanghai, China
Duration: 20 Mar 201625 Mar 2016

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2016-May
ISSN (Print)1520-6149

Conference

Conference41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
Country/TerritoryChina
CityShanghai
Period20/03/1625/03/16

Bibliographical note

Publisher Copyright:
© 2016 IEEE.

Fingerprint

Dive into the research topics of 'Efficient algorithms for linear polyhedral bandits'. Together they form a unique fingerprint.

Cite this