TY - JOUR
T1 - Decentralized Learning for Channel Allocation in IoT Networks Over Unlicensed Bandwidth as a Contextual Multi-Player Multi-Armed Bandit Game
AU - Wang, Wenbo
AU - Leshem, Amir
AU - Niyato, Dusit
AU - Han, Zhu
N1 - Publisher Copyright:
© 2002-2012 IEEE.
PY - 2022/5/1
Y1 - 2022/5/1
N2 - We study a decentralized channel allocation problem in an ad-hoc Internet of Things network underlaying on the spectrum licensed to a primary cellular network. In the considered network, the impoverished channel sensing/probing capability and computational resource on the IoT devices make them difficult to acquire the detailed Channel State Information (CSI) for the shared multiple channels. In practice, the unknown patterns of the primary users' transmission activities and the time-varying CSI (e.g., due to small-scale fading or device mobility) also cause stochastic changes in the channel quality. Decentralized IoT links are thus expected to learn channel conditions online based on partial observations, while acquiring no information about the channels that they are not operating on. They also have to reach an efficient, collision-free solution of channel allocation with limited coordination. Our study maps this problem into a contextual multi-player, multi-armed bandit game, and proposes a purely decentralized, three-stage policy learning algorithm through trial-and-error. Theoretical analyses shows that the proposed scheme guarantees the IoT links to jointly converge to the socially optimal channel allocation with a sub-linear (i.e., polylogarithmic) regret with respect to the operational time. Simulations demonstrate that it strikes a good balance between efficiency and network scalability when compared with the other state-of-the-art decentralized bandit algorithms.
AB - We study a decentralized channel allocation problem in an ad-hoc Internet of Things network underlaying on the spectrum licensed to a primary cellular network. In the considered network, the impoverished channel sensing/probing capability and computational resource on the IoT devices make them difficult to acquire the detailed Channel State Information (CSI) for the shared multiple channels. In practice, the unknown patterns of the primary users' transmission activities and the time-varying CSI (e.g., due to small-scale fading or device mobility) also cause stochastic changes in the channel quality. Decentralized IoT links are thus expected to learn channel conditions online based on partial observations, while acquiring no information about the channels that they are not operating on. They also have to reach an efficient, collision-free solution of channel allocation with limited coordination. Our study maps this problem into a contextual multi-player, multi-armed bandit game, and proposes a purely decentralized, three-stage policy learning algorithm through trial-and-error. Theoretical analyses shows that the proposed scheme guarantees the IoT links to jointly converge to the socially optimal channel allocation with a sub-linear (i.e., polylogarithmic) regret with respect to the operational time. Simulations demonstrate that it strikes a good balance between efficiency and network scalability when compared with the other state-of-the-art decentralized bandit algorithms.
KW - Contextual multi-player multi-armed bandits
KW - ad-hoc IoTs
KW - decentralized learning
KW - sub-linear regret
UR - http://www.scopus.com/inward/record.url?scp=85118280210&partnerID=8YFLogxK
U2 - 10.1109/TWC.2021.3119204
DO - 10.1109/TWC.2021.3119204
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85118280210
SN - 1536-1276
VL - 21
SP - 3162
EP - 3178
JO - IEEE Transactions on Wireless Communications
JF - IEEE Transactions on Wireless Communications
IS - 5
ER -