期刊论文

【摘要】

Reinforcement learning (RL) and optimal control of systems with continuous states and actions require approximation techniques in most interesting cases. In this article, we introduce Gaussian process dynamic programming (GPDP), an approximate value function-based RL algorithm. We consider both a classic optimal control problem, where problem-specific prior knowledge is available, and a classic RL problem, where only very general priors can be used. For the classic optimal control problem. GPDP models the unknown value functions with Gaussian processes and generalizes dynamic programming to continuous-valued states and actions. For the RL problem, GPDP starts from a given initial state and explores the state space using Bayesian active learning. To design a fast learner, available data have to be used efficiently. Hence, we propose to learn probabilistic models of the a priori unknown transition dynamics and the value functions on the fly. In both cases, we successfully apply the resulting continuous-valued controllers to the under-actuated pendulum swing up and analyze the performances of the suggested algorithms. It turns out that GPDP uses data very efficiently and can be applied to problems, where classic dynamic programming would be cumbersome. (C) 2009 Elsevier B.V. All rights reserved.

【授权许可】

Free

【预览】

附件列表
Files	Size	Format	View
10_1016_j_neucom_2008_12_019.pdf	1166KB	PDF	download

NEUROCOMPUTING	卷:72
Gaussian process dynamic programming
Article; Proceedings Paper
Deisenroth, Marc Peter^1,2 Rasmussen, Carl Edward^1,3 Peters, Jan⁴
[1] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England
[2] Univ Karlsruhe TH, Fac Informat, Karlsruhe, Germany
[3] Max Planck Inst Biol Cybernet, Tubingen, Germany
[4] Univ So Calif, Los Angeles, CA USA
关键词: Reinforcement learning; Optimal control; Dynamic programming; Gaussian processes; Bayesian active learning; Policy learning;
DOI : 10.1016/j.neucom.2008.12.019
来源: Elsevier
PDF


	文献评价指标
	下载次数：3次	浏览次数：1次

【 摘 要 】

【 授权许可】

【 预 览 】

【摘要】

【授权许可】

【预览】