学位论文详细信息
Safe reinforcement learning: An overview, a hybrid systems perspective, and a case study
Reinforcement Learning
Potok, Matthew ; Mitra ; Sayan
关键词: Reinforcement Learning;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/102518/POTOK-THESIS-2018.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

Reinforcement learning (RL) is a general method for agents to learn optimal control policies through exploration and experience. Due to its generality, RL can generate novel policies that may not be easily expressed with rules-based strategies or traditional control techniques. Over the years since its inception, RL has been able to solve increasingly more challenging control problems, from GridWorld to Go. Despite these impressive results, the successes of RL have been predominantly limited to systems with discrete environments and agents, particularly video and board games.A key barrier to using RL in safety-critical cyber-physical system applications is not only transferring these results to continuous domains but also ensuring that a notion of `safety' is upheld during the learning process. This thesis highlights some of the recent contributions in safe learning and presents a framework, FoRShield, for learning safe policies of a control system with nonlinear dynamics. The framework develops a generic hybrid systems model for online RL. The model is used to formalize a shield that can filter unsafe action choices and proved feedback to the underlying RL system.The thesis presents a concrete approach for computing the shield utilizing existing reachability analysis tools. The feasibility of this approach is illustrated against a case study with a quadcopter that uses RL to discover a safe and optimal plan for a dynamic fire-fighting task. The approach is realized as an open-source framework, FoRShield. The framework is implemented in Python in a modular fashion to allow for testing of a variety of algorithms. Our particular implementation utilizes the Actor-Critic algorithm to learn policies. The experiments show that interesting fire-fighting strategies can be safely learned for a discrete environment with 2^32 states and a 9-dimensional plant model using a standard laptop computer.

【 预 览 】
附件列表
Files Size Format View
Safe reinforcement learning: An overview, a hybrid systems perspective, and a case study 620KB PDF download
  文献评价指标  
  下载次数:65次 浏览次数:19次