This paper presents an algorithm and regret analysis for the restless hidden Markov bandit problem with linear rewards. In this problem the reward received by the decision maker is a random linear function which depends on the arm selected and a hidden state. In contrast to previous works on Markovian bandits, we do not assume that the decision maker receives information regarding the state of the system, but can only infer/estimate it based on its actions and the received reward. Additionally, it is assumed that the decision maker knows in advance that the reward is a random linear function which depends on the selected arm, the action, and hidden states. However, the decision maker does not know in advance the probability distributions of these hidden states; thus we call this side information structural side information. Surprisingly, we can still maintain logarithmic regret in the case of polyhedral action set. Furthermore, we show that the structural side information leads to expected regret that does not depend on the number of extreme points in the action space.
|Original language||American English|
|Title of host publication||2020 59th IEEE Conference on Decision and Control, CDC 2020|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|Number of pages||7|
|State||Published - 14 Dec 2020|
|Event||59th IEEE Conference on Decision and Control, CDC 2020 - Virtual, Jeju Island, Korea, Republic of|
Duration: 14 Dec 2020 → 18 Dec 2020
|Name||Proceedings of the IEEE Conference on Decision and Control|
|Conference||59th IEEE Conference on Decision and Control, CDC 2020|
|Country/Territory||Korea, Republic of|
|City||Virtual, Jeju Island|
|Period||14/12/20 → 18/12/20|
Bibliographical notePublisher Copyright:
© 2020 IEEE.