Two-Stage Resource Allocation in Reconfigurable Intelligent Surface Assisted Hybrid Networks via Multi-player Bandits

Jingwen Tong, Hongliang Zhang, Liqun Fu*, Amir Leshem, Zhu Han

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


This paper considers a resource allocation problem where several Internet-of-Things (IoT) devices send data to a base station (BS) with or without the help of the reconfigurable intelligent surface (RIS) assisted cellular network. The objective is to maximize the sum rate of all IoT devices by finding the optimal RIS and spreading factor (SF) for each device. Since these IoT devices lack prior information of the RISs or the channel state information (CSI), a distributed resource allocation framework with low complexity and learning features is required to achieve this goal. Therefore, we model this problem as a two-stage multi-player multi-armed bandit (MPMAB) framework to learn the optimal RIS and SF sequentially. Then, we put forth an exploration and exploitation boosting (E2Boost) algorithm to solve this two-stage MPMAB problem by combining the ϵ-greedy algorithm, Thompson sampling (TS) algorithm, and non-cooperation game method. We derive an upper regret bound for the proposed algorithm, i.e., O(log1+δ2T), increasing logarithmically with the time horizon T. Numerical results show that the E2Boost algorithm has the best performance among the existing methods and exhibits a fast convergence rate. More importantly, the proposed algorithm is not sensitive to the number of combinations of the RISs and SFs thanks to the two-stage allocation mechanism, which can benefit the high-density networks.

Original languageAmerican English
Pages (from-to)3526-3541
Number of pages16
JournalIEEE Transactions on Communications
Issue number5
StatePublished - 1 May 2022
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 1972-2012 IEEE.


  • Internet of Things (IoT)
  • Reconfigurable intelligent surface (RIS)
  • Thompson sampling (TS)
  • exploration and exploitation boosting (E2Boost) algorithm
  • multi-player multi-armed bandit (MPMAB)


Dive into the research topics of 'Two-Stage Resource Allocation in Reconfigurable Intelligent Surface Assisted Hybrid Networks via Multi-player Bandits'. Together they form a unique fingerprint.

Cite this