会议论文详细信息
5th Asian Conference on Machine Learning
Exploration vs Exploitation vs Safety: RiskAware MultiArmed Bandits
数学科学;计算机科学
Nicolas Galichet Nicolas.Galichet@lri.fr ; Olivier Teytaud Olivier.Teytaud@lri.fr
PID  :  123075
来源: CEUR
PDF
【 摘 要 】

Motivated by applications in energy management, this paper presents the MultiArmed RiskAware Bandit (MaRaB) algorithm. With the goal of limiting the exploration of risky arms, MaRaB takes as arm quality its conditional value at risk. When the user supplied risk level goes to 0, the arm quality tends toward the essential infimum of the arm distribution density, and MaRaB tends toward the MIN multiarmed bandit algo rithm, aimed at the arm with maximal minimal value. As a first contribution, this paper presents a theoretical analysis of the MIN algorithm under mild assumptions, establishing its robustness comparatively to UCB. The analysis is supported by extensive experimen tal validation of MIN and MaRaB compared to UCB and stateofart riskaware MAB algorithms on artificial and realworld problems.

【 预 览 】
附件列表
Files Size Format View
Exploration vs Exploitation vs Safety: RiskAware MultiArmed Bandits 939KB PDF download
  文献评价指标  
  下载次数:11次 浏览次数:50次