期刊论文详细信息
IEEE Access
A Study of First-Passage Time Minimization via Q-Learning in Heated Gridworlds
Grigory Yaremenko1  Pavel Osinenko1  Vladimir V. Palyulin1  Maria A. Larchenko1 
[1] Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Moscow, Russia;
关键词: First-passage times;    path planning;    reinforcement learning;    stochastic systems;   
DOI  :  10.1109/ACCESS.2021.3129709
来源: DOAJ
【 摘 要 】

Optimization of first-passage times is required in applications ranging from nanobots navigation to market trading. In such settings, one often encounters unevenly distributed noise levels across the environment. We extensively study how a learning agent fares in 1- and 2- dimensional heated gridworlds with an uneven temperature distribution. The results show certain bias effects in agents trained via simple tabular Q-learning, SARSA, Expected SARSA and Double Q-learning. Namely, the state-dependency of noise triggers convergence to suboptimal solutions and the respective policies follow them for practically long learning times. The high learning rate prevents exploration of regions with higher temperature, while the low enough rate increases the presence of agents in such regions. These biases of temporal-difference-based reinforcement learning methods may have implications for their application in real-world physical scenarios and agent design.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:0次