会议论文

【摘要】

Motivated by applications in energy management, this paper presents the MultiArmed RiskAware Bandit (MaRaB) algorithm. With the goal of limiting the exploration of risky arms, MaRaB takes as arm quality its conditional value at risk. When the user supplied risk level goes to 0, the arm quality tends toward the essential infimum of the arm distribution density, and MaRaB tends toward the MIN multiarmed bandit algo rithm, aimed at the arm with maximal minimal value. As a first contribution, this paper presents a theoretical analysis of the MIN algorithm under mild assumptions, establishing its robustness comparatively to UCB. The analysis is supported by extensive experimen tal validation of MIN and MaRaB compared to UCB and stateofart riskaware MAB algorithms on artificial and realworld problems.

【预览】

附件列表
Files	Size	Format	View
Exploration vs Exploitation vs Safety: RiskAware MultiArmed Bandits	939KB	PDF	download

5th Asian Conference on Machine Learning
Exploration vs Exploitation vs Safety: RiskAware MultiArmed Bandits
数学科学;计算机科学
Nicolas Galichet Nicolas.Galichet@lri.fr ; Olivier Teytaud Olivier.Teytaud@lri.fr
PID : 123075

来源: CEUR
PDF


	文献评价指标
	下载次数：11次	浏览次数：50次

【 摘 要 】

【 预 览 】

【摘要】

【预览】