期刊论文详细信息
BMC Neuroscience
‘Proactive’ use of cue-context congruence for building reinforcement learning’s reward function
Research Article
Judit Zsuga1  Klara Biro1  Csaba Papp1  Gabor Tajti1  Bela Juhasz2  Rudolf Gesztelyi3  Magdolna Emma Szilasi3 
[1] Department of Health Systems Management and Quality Management for Health Care, Faculty of Public Health, University of Debrecen, Nagyerdei krt. 98, 4032, Debrecen, Hungary;Department of Pharmacology and Pharmacotherapy, Faculty of Medicine, University of Debrecen, Nagyerdei krt. 98, 4032, Debrecen, Hungary;Department of Pharmacology, Faculty of Pharmacy, University of Debrecen, Nagyerdei krt. 98, 4032, Debrecen, Hungary;
关键词: Model-based reinforcement learning;    Proactive brain;    Bellman equation;    Reward function;    Policy function;    Cue-context congruence;   
DOI  :  10.1186/s12868-016-0302-7
 received in 2015-11-12, accepted in 2016-10-14,  发布年份 2016
来源: Springer
PDF
【 摘 要 】

BackgroundReinforcement learning is a fundamental form of learning that may be formalized using the Bellman equation. Accordingly an agent determines the state value as the sum of immediate reward and of the discounted value of future states. Thus the value of state is determined by agent related attributes (action set, policy, discount factor) and the agent’s knowledge of the environment embodied by the reward function and hidden environmental factors given by the transition probability. The central objective of reinforcement learning is to solve these two functions outside the agent’s control either using, or not using a model.ResultsIn the present paper, using the proactive model of reinforcement learning we offer insight on how the brain creates simplified representations of the environment, and how these representations are organized to support the identification of relevant stimuli and action. Furthermore, we identify neurobiological correlates of our model by suggesting that the reward and policy functions, attributes of the Bellman equitation, are built by the orbitofrontal cortex (OFC) and the anterior cingulate cortex (ACC), respectively.ConclusionsBased on this we propose that the OFC assesses cue-context congruence to activate the most context frame. Furthermore given the bidirectional neuroanatomical link between the OFC and model-free structures, we suggest that model-based input is incorporated into the reward prediction error (RPE) signal, and conversely RPE signal may be used to update the reward-related information of context frames and the policy underlying action selection in the OFC and ACC, respectively. Furthermore clinical implications for cognitive behavioral interventions are discussed.

【 授权许可】

CC BY   
© The Author(s) 2016

【 预 览 】
附件列表
Files Size Format View
RO202311095919050ZK.pdf 1147KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  • [41]
  • [42]
  • [43]
  • [44]
  • [45]
  • [46]
  • [47]
  • [48]
  • [49]
  • [50]
  • [51]
  • [52]
  • [53]
  • [54]
  • [55]
  • [56]
  • [57]
  • [58]
  • [59]
  • [60]
  • [61]
  • [62]
  • [63]
  • [64]
  • [65]
  • [66]
  • [67]
  • [68]
  • [69]
  • [70]
  • [71]
  • [72]
  • [73]
  • [74]
  • [75]
  • [76]
  • [77]
  文献评价指标  
  下载次数:6次 浏览次数:1次