Motivated by applications in energy management, this paper presents the MultiArmed RiskAware Bandit (MaRaB) algorithm. With the goal of limiting the exploration of risky arms, MaRaB takes as arm quality its conditional value at risk. When the user supplied risk level goes to 0, the arm quality tends toward the essential infimum of the arm distribution density, and MaRaB tends toward the MIN multiarmed bandit algo rithm, aimed at the arm with maximal minimal value. As a first contribution, this paper presents a theoretical analysis of the MIN algorithm under mild assumptions, establishing its robustness comparatively to UCB. The analysis is supported by extensive experimen tal validation of MIN and MaRaB compared to UCB and stateofart riskaware MAB algorithms on artificial and realworld problems.
【 预 览 】
附件列表
Files
Size
Format
View
Exploration vs Exploitation vs Safety: RiskAware MultiArmed Bandits