期刊论文详细信息
A distributional code for value in dopamine-based reinforcement learning
Article
关键词: REWARD;    GRADIENTS;    CIRCUITRY;    RESPONSES;    NEURONS;    SITES;    D-1;   
DOI  :  10.1038/s41586-019-1924-6
来源: SCIE
【 摘 要 】

Since its introduction, the reward prediction error theory of dopamine has explained a wealth of empirical phenomena, providing a unifying framework for understanding the representation of reward and value in the brain(1-3). According to the now canonical theory, reward predictions are represented as a single scalar quantity, which supports learning about the expectation, or mean, of stochastic outcomes. Here we propose an account of dopamine-based reinforcement learning inspired by recent artificial intelligence research on distributional reinforcement learning(4-6). We hypothesized that the brain represents possible future rewards not as a single mean, but instead as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel. This idea implies a set of empirical predictions, which we tested using single-unit recordings from mouse ventral tegmental area. Our findings provide strong evidence for a neural realization of distributional reinforcement learning.

【 授权许可】

Free   

  文献评价指标  
  下载次数:0次 浏览次数:0次