One of the fundamental problems of artificial intelligence is learning how to behave optimally. With applications ranging from self-driving cars to medical devices, this task is vital to modern society. There are two complementary problems in this area – reinforcement learning and inverse reinforcement learning. While reinforcement learning tries to find an optimal strategy in a given environment with known rewards for each action, inverse reinforcement learning or inverse optimal control seeks to recover rewards associated with actions given the environment and an optimal policy. Typically, apprenticeship learning is approached as a combination of these two techniques. This is an iterative process – at each step inverse reinforcement learning is applied first to get the rewards, followed by reinforcement learning to produce a guess for an optimal policy. Each guess is used in the further iterations to come up with a more accurate estimate of the reward function. While this works for problems with a small number of discreet states, the approach scales poorly. In order to mitigate those limitations, this research proposes a robust approach based on recent advances in the field of deep learning. Using the matrix formulation of inverse reinforcement learning, a reward function and an optimal policy can be recovered without having to iteratively optimize both. The approach scales well for problems with very large and continuous state spaces such as autonomous vehicle navigation. An evaluation performed using OpenAI RLLab suggests that this method is robust and ready to be adopted for solving problems both in research and various industries.
【 预 览 】
附件列表
Files
Size
Format
View
Faster apprenticeship learning through inverse optimal control