...
首页> 外文期刊>The European Journal of Neuroscience >Anticipatory reward signals in ventral striatal neurons of behaving rats.
【24h】

Anticipatory reward signals in ventral striatal neurons of behaving rats.

机译:行为大鼠腹侧纹状体神经元中的预期奖励信号。

获取原文
获取原文并翻译 | 示例
           

摘要

It has been proposed that the striatum plays a crucial role in learning to select appropriate actions, optimizing rewards according to the principles of 'Actor-Critic' models of trial-and-error learning. The ventral striatum (VS), as Critic, would employ a temporal difference (TD) learning algorithm to predict rewards and drive dopaminergic neurons. This study examined this model's adequacy for VS responses to multiple rewards in rats. The respective arms of a plus-maze provided rewards of varying magnitudes; multiple rewards were provided at 1-s intervals while the rat stood still. Neurons discharged phasically prior to each reward, during both initial approach and immobile waiting, demonstrating that this signal is predictive and not simply motor-related. In different neurons, responses could be greater for early, middle or late droplets in the sequence. Strikingly, this activity often reappeared after the final reward, as if in anticipation of yet another. In contrast, previous TD learning models show decremental reward-prediction profiles during reward consumption due to a temporal-order signal introduced to reproduce accurate timing in dopaminergic reward-prediction error signals. To resolve this inconsistency in a biologically plausible manner, we adapted the TD learning model such that input information is nonhomogeneously distributed among different neurons. By suppressing reward temporal-order signals and varying richness of spatial and visual input information, the model reproduced the experimental data. This validates the feasibility of a TD-learning architecture where different groups of neurons participate in solving the task based on varied input information.
机译:已经提出,纹状体在学习选择适当的动作,根据试错学习的“ Actor-Critic”模型的原则来优化奖励方面起着至关重要的作用。腹侧纹状体(VS)(如Critic)将采用时差(TD)学习算法来预测奖励并驱动多巴胺能神经元。这项研究检查了该模型对于大鼠对多种奖励的VS反应的适当性。迷宫的各个部分提供了不同程度的奖励;当大鼠静止不动时,以1-s的间隔提供多次奖励。神经元在每次奖励之前都处于阶段性放电状态,无论是在初始进场还是在静止等待期间,都表明该信号是可预测的,而不仅仅是与运动有关。在不同的神经元中,序列中早期,中期或晚期液滴的响应可能更大。令人惊讶的是,这种活动经常在获得最终奖励后重新出现,好像在期待另一个活动一样。相反,先前的TD学习模型由于引入了用来重现多巴胺能奖励预测误差信号中的准确时序的时间顺序信号,在奖励消费期间显示了递减的奖励预测配置文件。为了以生物学上合理的方式解决此不一致问题,我们采用了TD学习模型,以使输入信息在不同神经元之间非均匀分布。通过抑制奖励时间顺序信号并改变空间和视觉输入信息的丰富度,该模型再现了实验数据。这验证了TD学习架构的可行性,在该架构中,不同的神经元组会根据各种输入信息参与解决任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号