首页> 美国卫生研究院文献>Springer Open Choice >Decomposing the effects of context valence and feedback information on speed and accuracy during reinforcement learning: a meta-analytical approach using diffusion decision modeling
【2h】

Decomposing the effects of context valence and feedback information on speed and accuracy during reinforcement learning: a meta-analytical approach using diffusion decision modeling

机译:在强化学习中分解上下文价和反馈信息对速度和准确性的影响:使用扩散决策模型的元分析方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Reinforcement learning (RL) models describe how humans and animals learn by trial-and-error to select actions that maximize rewards and minimize punishments. Traditional RL models focus exclusively on choices, thereby ignoring the interactions between choice preference and response time (RT), or how these interactions are influenced by contextual factors. However, in the field of perceptual decision-making, such interactions have proven to be important to dissociate between different underlying cognitive processes. Here, we investigated such interactions to shed new light on overlooked differences between learning to seek rewards and learning to avoid losses. We leveraged behavioral data from four RL experiments, which feature manipulations of two factors: outcome valence (gains vs. losses) and feedback information (partial vs. complete feedback). A Bayesian meta-analysis revealed that these contextual factors differently affect RTs and accuracy: While valence only affects RTs, feedback information affects both RTs and accuracy. To dissociate between the latent cognitive processes, we jointly fitted choices and RTs across all experiments with a Bayesian, hierarchical diffusion decision model (DDM). We found that the feedback manipulation affected drift rate, threshold, and non-decision time, suggesting that it was not a mere difficulty effect. Moreover, valence affected non-decision time and threshold, suggesting a motor inhibition in punishing contexts. To better understand the learning dynamics, we finally fitted a combination of RL and DDM (RLDDM). We found that while the threshold was modulated by trial-specific decision conflict, the non-decision time was modulated by the learned context valence. Overall, our results illustrate the benefits of jointly modeling RTs and choice data during RL, to reveal subtle mechanistic differences underlying decisions in different learning contexts.Electronic supplementary materialThe online version of this article (10.3758/s13415-019-00723-1) contains supplementary material, which is available to authorized users.
机译:强化学习(RL)模型描述了人类和动物如何通过反复试验来学习,以选择能够最大程度地提高报酬和最小化惩罚的行动。传统的RL模型仅专注于选择,从而忽略了选择偏好和响应时间(RT)之间的相互作用,或这些相互作用如何受上下文因素影响。然而,在知觉决策领域,这种相互作用被证明对分离不同的潜在认知过程很重要。在这里,我们调查了这种相互作用,以期为学习寻求奖励与学习避免损失之间被忽视的差异提供新的思路。我们利用了来自四个RL实验的行为数据,其中包括两个因素的操纵:结果价(收益与损失)和反馈信息(部分与完全反馈)。贝叶斯(Bayesian)荟萃分析显示,这些上下文因素对RT和准确性的影响不同:效价仅影响RT,而反馈信息对RT和准确性均具有影响。为了分离潜在的认知过程,我们使用贝叶斯分级扩散决策模型(DDM)对所有实验中的选择和RT进行联合拟合。我们发现,反馈操作会影响漂移率,阈值和非决策时间,这表明这不仅仅是困难的影响。此外,化合价影响非决定性时间和阈值,表明在惩罚情况下运动抑制。为了更好地了解学习动态,我们最终将RL和DDM(RLDDM)组合在一起。我们发现,虽然阈值是由特定于试验的决策冲突来调节的,但非决策时间却是由学习到的情境价调节的。总的来说,我们的结果说明了在RL期间对RT和选择数据进行联合建模的好处,以揭示不同学习情境中决策的细微机械差异。电子补充材料本文的在线版本(10.3758 / s13415-019-00723-1)包含补充内容资料,可供授权用户使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号