学位论文

【摘要】

The uncoordinated spectrum access problem is studied using a multi-player multi-armed bandits framework. We consider a decentralized multi-player stochastic multi-armed bandit model where the players cannot communicate with each other and can observe only their own actions and rewards. Furthermore, the environment may appear differently to different players, i.e., the reward distributions for a given arm may vary across players. Knowledge of time horizon T is not assumed. Under these conditions, we consider two settings - zero and non-zero reward on collision (when more than one player plays the same arm). Under the zero reward on collision setting, we present a policy that achieves expected regret of O(log T) over a time horizon of duration T. While settings with non-zero rewards on collisions and varying reward distributions of arms across players have been considered separately in prior work, a model allowing for both has not been studied previously to the best of our knowledge. With this setup, we present a policy that achieves expected regret of order O(log^{2 + \delta} T) for some 0 < \delta < 1 over a time horizon of duration T.

【预览】

附件列表
Files	Size	Format	View
Decentralized multi-user multi-armed bandits with user dependent reward distributions	342KB	PDF	download


Decentralized multi-user multi-armed bandits with user dependent reward distributions
multiarmed bandits, multi-player, spectrum access, decentralized
Magesh, Akshayaa ; Veeravalli ; VenugopalV.
关键词: multiarmed bandits, multi-player, spectrum access, decentralized;
Others : https://www.ideals.illinois.edu/bitstream/handle/2142/107917/MAGESH-THESIS-2020.pdf?sequence=1&isAllowed=y
美国\|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF


	文献评价指标
	下载次数：13次	浏览次数：31次

【 摘 要 】

【 预 览 】

【摘要】

【预览】