学位论文详细信息
Learning To Use Memory.
Reinforcement Learning;Memory;Artificial Intelligence;Sequential Decision Making;Computer Science;Engineering;Computer Science & Engineering
Gorski, Nicholas A.Polk, Thad A. ;
University of Michigan
关键词: Reinforcement Learning;    Memory;    Artificial Intelligence;    Sequential Decision Making;    Computer Science;    Engineering;    Computer Science & Engineering;   
Others  :  https://deepblue.lib.umich.edu/bitstream/handle/2027.42/91491/ngorski_1.pdf?sequence=1&isAllowed=y
瑞士|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

This thesis is a comprehensive empirical exploration of using reinforcement learning to learn to use simple forms of working memory. Learning to use memory involves learning how to behave in the environment while simultaneously learning when to select internal actions that control how knowledge persists in memory and learning how to use that information stored in memory to make decisions. We focus on two different models of memory: bit memory and gated memory. Bit memory is inspired by prior reinforcement learning literature and stores abstract values, which an agent can learn to associate with task history. Gated memory is inspired by human working memory and stores perceptually grounded symbols. Our goal is to determine computational bounds on the tractability of learning to use these memories. We conduct a comprehensive empirical exploration of the dynamics of learning to use memory models by modifying a simple partially observable task, TMaze, along specific dimensions: length of temporal delay, number of dependent decisions, number of distinct symbols, quantity of concurrent knowledge, and availability of second-order knowledge. We find that learning to use gated memory is significantly more tractable than learning to use bit memory because it stores perceptually grounded symbols in memory. We further find that learning performance scales more favorably along temporal delay, distinct symbols, and concurrent knowledge when learning to use gated memory than along other dimensions. We also identify situations in which agents fail to learn to use gated memory optimally which involve repeated identical observations which result in no unambiguous trajectories through the underlying task and memory state space.

【 预 览 】
附件列表
Files Size Format View
Learning To Use Memory. 8539KB PDF download
  文献评价指标  
  下载次数:17次 浏览次数:19次