卷:322 | |
Certified reinforcement learning with logic guidance | |
Article | |
关键词: MARKOV DECISION-PROCESSES; DISCRETE-TIME; STATE; SAFETY; APPROXIMATION; VERIFICATION; REACHABILITY; CHECKING; NETWORKS; | |
DOI : 10.1016/j.artint.2023.103949 | |
来源: SCIE |
【 摘 要 】
Reinforcement Learning (RL) is a widely employed machine learning architecture that has been applied to a variety of control problems. However, applications in safety-critical domains require a systematic and formal approach to specifying requirements as tasks or goals. We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs). The given LTL property is translated into a Limit-Deterministic Generalised Buchi Automaton (LDGBA), which is then used to shape a synchronous reward function on-thefly. Under certain assumptions, the algorithm is guaranteed to synthesise a control policy whose traces satisfy the LTL specification with maximal probability.& COPY; 2023 Elsevier B.V. All rights reserved.
【 授权许可】
Free