学位论文详细信息
Decentralized multi-user multi-armed bandits with user dependent reward distributions
multiarmed bandits, multi-player, spectrum access, decentralized
Magesh, Akshayaa ; Veeravalli ; VenugopalV.
关键词: multiarmed bandits, multi-player, spectrum access, decentralized;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/107917/MAGESH-THESIS-2020.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

The uncoordinated spectrum access problem is studied using a multi-player multi-armed bandits framework. We consider a decentralized multi-player stochastic multi-armed bandit model where the players cannot communicate with each other and can observe only their own actions and rewards. Furthermore, the environment may appear differently to different players, i.e., the reward distributions for a given arm may vary across players. Knowledge of time horizon T is not assumed. Under these conditions, we consider two settings - zero and non-zero reward on collision (when more than one player plays the same arm). Under the zero reward on collision setting, we present a policy that achieves expected regret of O(log T) over a time horizon of duration T. While settings with non-zero rewards on collisions and varying reward distributions of arms across players have been considered separately in prior work, a model allowing for both has not been studied previously to the best of our knowledge. With this setup, we present a policy that achieves expected regret of order O(log^{2 + \delta} T) for some 0 < \delta < 1 over a time horizon of duration T.

【 预 览 】
附件列表
Files Size Format View
Decentralized multi-user multi-armed bandits with user dependent reward distributions 342KB PDF download
  文献评价指标  
  下载次数:13次 浏览次数:31次