Restless Hidden Markov Bandit with Linear Rewards

Michal Yemini, Amir Leshem, Anelia Somekh-Baruch

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

This paper presents an algorithm and regret analysis for the restless hidden Markov bandit problem with linear rewards. In this problem the reward received by the decision maker is a random linear function which depends on the arm selected and a hidden state. In contrast to previous works on Markovian bandits, we do not assume that the decision maker receives information regarding the state of the system, but can only infer/estimate it based on its actions and the received reward. Additionally, it is assumed that the decision maker knows in advance that the reward is a random linear function which depends on the selected arm, the action, and hidden states. However, the decision maker does not know in advance the probability distributions of these hidden states; thus we call this side information structural side information. Surprisingly, we can still maintain logarithmic regret in the case of polyhedral action set. Furthermore, we show that the structural side information leads to expected regret that does not depend on the number of extreme points in the action space.

Original languageEnglish
Title of host publication2020 59th IEEE Conference on Decision and Control, CDC 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1183-1189
Number of pages7
ISBN (Electronic)9781728174471
DOIs
StatePublished - 14 Dec 2020
Externally publishedYes
Event59th IEEE Conference on Decision and Control, CDC 2020 - Virtual, Jeju Island, Korea, Republic of
Duration: 14 Dec 202018 Dec 2020

Publication series

NameProceedings of the IEEE Conference on Decision and Control
Volume2020-December
ISSN (Print)0743-1546
ISSN (Electronic)2576-2370

Conference

Conference59th IEEE Conference on Decision and Control, CDC 2020
Country/TerritoryKorea, Republic of
CityVirtual, Jeju Island
Period14/12/2018/12/20

Bibliographical note

Publisher Copyright:
© 2020 IEEE.

Fingerprint

Dive into the research topics of 'Restless Hidden Markov Bandit with Linear Rewards'. Together they form a unique fingerprint.

Cite this