学位论文

【摘要】

Reinforcement Learning (RL) is a machine learning paradigm where an agent learns to accomplish sequential decision-making tasks from experience. Applications of RL are found in robotics and control, dialog systems, medical treatment, etc. Despite the generality of the framework, most empirical successes of RL to-date are restricted to simulated environments, where hyperparameters are tuned by trial and error using large amounts of data. In contrast, collecting data with active intervention in the real world can be costly, time-consuming, and sometimes unsafe. Choosing the hyperparameters and understanding their effects in face of these data limitations, i.e., model selection, is an important yet open direction that we need to study to enable such applications of RL, which is the main theme of this thesis.More concretely, this thesis presents theoretical results that improve our understanding of 3 hyperparameters in RL: planning horizon, state representation (abstraction), and reward function. The 1st part of the thesis focuses on the interplay between planning horizon and limited amount of data, and establishes a formal explanation for how a long planning horizon can cause overfitting. The 2nd part considers the problem of choosing the right state abstraction using limited batch data; I show that cross-validation type methods require importance sampling and suffer from exponential variance, and a novel regularization-based algorithm enjoys an oracle-like property. The 3rd part investigates reward misspecification and tries to resolve it by leveraging expert demonstrations, which is inspired by AI safety concerns and bears close connections to inverse reinforcement learning.A recurring theme of the thesis is the deployment of formulations and techniques from other machine learning theory (mostly statistical learning theory): the planning horizon work explains the overfitting phenomenon by making a formal analogy to empirical risk minimization and by proving planning loss bounds that are similar to generalization error bounds; the main result in the abstraction selection work takes the form of an oracle inequality, which is a concept from structural risk minimization for model selection in supervised learning; the inverse RL work provides a mistake-bound type analysis under arbitrarily chosen environments, which can be viewed as a form of no-regret learning. Overall, by borrowing ideas from mature theories of machine learning, we can develop analogies for RL that allow us to better understand the impact of hyperparameters, and develop algorithms that automatically set them in an effective manner.

【预览】

附件列表
Files	Size	Format	View
A Theory of Model Selection in Reinforcement Learning	1329KB	PDF	download


A Theory of Model Selection in Reinforcement Learning
reinforcement learning;Computer Science;Engineering;Computer Science & Engineering
Jiang, NanTewari, Ambuj ;
University of Michigan
关键词: reinforcement learning; Computer Science; Engineering; Computer Science & Engineering;
Others : https://deepblue.lib.umich.edu/bitstream/handle/2027.42/138518/nanjiang_1.pdf?sequence=1&isAllowed=y
瑞士\|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF


	文献评价指标
	下载次数：61次	浏览次数：34次

【 摘 要 】

【 预 览 】

【摘要】

【预览】