期刊论文

【摘要】

Optimization of first-passage times is required in applications ranging from nanobots navigation to market trading. In such settings, one often encounters unevenly distributed noise levels across the environment. We extensively study how a learning agent fares in 1- and 2- dimensional heated gridworlds with an uneven temperature distribution. The results show certain bias effects in agents trained via simple tabular Q-learning, SARSA, Expected SARSA and Double Q-learning. Namely, the state-dependency of noise triggers convergence to suboptimal solutions and the respective policies follow them for practically long learning times. The high learning rate prevents exploration of regions with higher temperature, while the low enough rate increases the presence of agents in such regions. These biases of temporal-difference-based reinforcement learning methods may have implications for their application in real-world physical scenarios and agent design.

【授权许可】

Unknown

IEEE Access
A Study of First-Passage Time Minimization via Q-Learning in Heated Gridworlds

Grigory Yaremenko¹ Pavel Osinenko¹ Vladimir V. Palyulin¹ Maria A. Larchenko¹
[1] Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Moscow, Russia;
关键词: First-passage times; path planning; reinforcement learning; stochastic systems;
DOI : 10.1109/ACCESS.2021.3129709
来源: DOAJ


	文献评价指标
	下载次数：0次	浏览次数：5次

【 摘 要 】

【 授权许可】

【摘要】

【授权许可】