期刊论文

期刊论文详细信息

NEUROCOMPUTING	卷:450
Delay-aware model-based reinforcement learning for continuous control
Article
Chen, Baiming¹ Xu, Mengdi² Li, Liang¹ Zhao, Ding²
[1] Tsinghua Univ, Beijing 100084, Peoples R China
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词: Model-based reinforcement learning; Markov decision process; Continuous control; Delayed system;
DOI : 10.1016/j.neucom.2021.04.015
来源: Elsevier
PDF

【摘要】

Action delays degrade the performance of reinforcement learning in many real-world systems. This paper proposes a formal definition of delay-aware Markov Decision Process and proves it can be transformed into standard MDP with augmented states using the Markov reward process. We develop a delay-aware model-based reinforcement learning framework that can incorporate the multi-step delay into the learned system models without learning effort. Experiments with the Gym and MuJoCo platforms show that the proposed delay-aware model-based algorithm is more efficient in training and transferable between systems with various durations of delay compared with state-of-the-art model-free reinforce-ment learning methods. (c) 2021 Elsevier B.V. All rights reserved.

【授权许可】

Free

【预览】

附件列表
Files	Size	Format	View
10_1016_j_neucom_2021_04_015.pdf	1452KB	PDF	download


	文献评价指标
	下载次数：5次	浏览次数：1次

京公网安备340104078870146号 878987797 028-85220240

OAinOne平台基于对开放资源的发现、遴选和评价方式，发现、获取、集成9类优质的开放科技资源，包括开放期刊、开放会议论文、开放课件、科技政策、开放学位论文、开放图书、开放科技报告、科研项目、开放科学数据。同时，为实现开放知识资源普遍服务、个性化服务、精准服务，基于OAinONE集成的丰富开放资源，开发建设领域开放知识资源服务定制工具(OAtoYOU)、开放资源评价评估体系（OAEvaluation），建设集成OAinONE资源及其他第三方资源的OA Hub，及其面向我院分布式大数据知识资源系统及其他第三方的开放接口服务，并打造特色专题数据库产品建设，包括科技政策集成及趋势平台、开放课程大讲堂等。此外，OAinOne构建开放知识资源建设的可持续发展机制，支持我院研究所特色馆藏资源、自建资源、古籍资源等在OAinONE平台上的集成、开放、共享。

【 摘 要 】

【 授权许可】

【 预 览 】

【摘要】

【授权许可】

【预览】