首页> 外文期刊>Psychonomic bulletin & review >A reinforcement learning diffusion decision model for value-based decisions
【24h】

A reinforcement learning diffusion decision model for value-based decisions

机译:基于价值的决策的加强学习扩散决策模型

获取原文
获取原文并翻译 | 示例
           

摘要

Psychological models of value-based decision-making describe how subjective values are formed and mapped to single choices. Recently, additional efforts have been made to describe the temporal dynamics of these processes by adopting sequential sampling models from the perceptual decision-making tradition, such as the diffusion decision model (DDM). These models, when applied to value-based decision-making, allow mapping of subjective values not only to choices but also to response times. However, very few attempts have been made to adapt these models to situations in which decisions are followed by rewards, thereby producing learning effects. In this study, we propose a new combined reinforcement learning diffusion decision model (RLDDM) and test it on a learning task in which pairs of options differ with respect to both value difference and overall value. We found that participants became more accurate and faster with learning, responded faster and more accurately when options had more dissimilar values, and decided faster when confronted with more attractive (i.e., overall more valuable) pairs of options. We demonstrate that the suggested RLDDM can accommodate these effects and does so better than previously proposed models. To gain a better understanding of the model dynamics, we also compare it to standard DDMs and reinforcement learning models. Our work is a step forward towards bridging the gap between two traditions of decision-making research.
机译:基于价值的决策的心理模型描述了如何形成主观值并映射到单个选择。最近,已经通过采用来自感知决策传统的顺序采样模型来描述这些过程的额外努力,例如扩散决策模型(DDM)。这些型号在应用于基于价值的决策时,允许映射主观值,而不仅可以选择,还可以映射到响应时间。然而,已经提出了很少的尝试,以使这些模型适应决策之后的情况,从而产生学习效果。在这项研究中,我们提出了一种新的组合强化学习扩散决策模型(RLDDM)并在学习任务上测试,其中对选项对具有值差异和总体值不同。 We found that participants became more accurate and faster with learning, responded faster and more accurately when options had more dissimilar values, and decided faster when confronted with more attractive (i.e., overall more valuable) pairs of options.我们证明建议的RLDDM可以适应这些效果,并且比以前提出的模型更好。为了更好地了解模型动态,我们还将其与标准DDMS和强化学习模型进行了比较。我们的作品是向前拓展两种决策研究传统之间差距的一步。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号