期刊论文详细信息
Computer Science and Information Systems
A kernel based true online Sarsa(λ) for continuous space control problems
Haijun Zhu1  Yuchen Fu2  Fei Zhu3  Xiaoke Zhou4 
[1] Provincial Key Laboratory for Computer Information Processing Technology, Soochow University;School of Computer Science and Engineering, Changshu Institute of Technology;School of Computer Science and Technology, Soochow University;University of Basque Country
关键词: reinforcement learning;    kernel method;    true online;    policy gradient;    Sarsa(λ);   
DOI  :  10.2298/CSIS170107029Z
学科分类:社会科学、人文和艺术(综合)
来源: Computer Science and Information Systems
PDF
【 摘 要 】

Reinforcement learning is an efficient learning method for the control problem by interacting with the environment to get an optimal policy. However, it also faces challenges such as low convergence accuracy and slow convergence. Moreover, conventional reinforcement learning algorithms could hardly solve continuous control problems. The kernel-based method can accelerate convergence speed and improve convergence accuracy; and the policy gradient method is a good way to deal with continuous space problems. We proposed a Sarsa(λ) version of true online time difference algorithm, named True Online Sarsa(λ)(TOSarsa(λ)), on the basis of the clustering-based sample specification method and selective kernelbased value function. The TOSarsa(λ) algorithm has a consistent result with both the forward view and the backward view which ensures to get an optimal policy in less time. Afterwards we also combined TOSarsa(λ) with heuristic dynamic programming. The experiments showed our proposed algorithm worked well in dealing with continuous control problem.

【 授权许可】

CC BY-NC-ND   

【 预 览 】
附件列表
Files Size Format View
RO201904027372930ZK.pdf 458KB PDF download
  文献评价指标  
  下载次数:9次 浏览次数:22次