首页> 外文会议>International joint conference on computational intelligence >Reinforcement Learning and Attractor Neural Network Models of Associative Learning
【24h】

Reinforcement Learning and Attractor Neural Network Models of Associative Learning

机译:联想学习的强化学习和吸引子神经网络模型

获取原文

摘要

Despite indisputable advances in reinforcement learning (RL) research, some cognitive and architectural challenges still remain. The primary source of challenges in the current conception of RL stems from the theory's way to define states. Whereas states under laboratory conditions are tractable (due to the Markov property), states in real-world RL are high-dimensional, continuous and partially observable. Hence, effective learning and generalization can be guaranteed if the subset of reward relevant dimensions were correctly identified for each state. Moreover, the computational discrepancy between model-free and model-based RL methods creates a stability-plasticity dilemma in terms of how to guide optimal decision-making control in case of interactive and competitive multiple systems, each of which implements different type of RL methods. By showing behavioral results of how human subjects flexibly define states in a reversal learning paradigm contrary to a simple RL model, we argue that these challenges can be met by infusing the RL framework as an algorithmic theory of human behavior with the strengths of the attractor framework at the level of neural implementation. Our position is supported by the hypothesis that 'attractor states' which are stable patterns of self-sustained and reverberating brain activity, are a manifestation of the collective dynamics of neuronal populations in the brain. With its capacity of pattern-completion along with the ability to link events in temporal order, an attractor network becomes relatively insensitive to noise allowing to account for sparse data which is characteristic to high-dimensional and continuous real-world RL.
机译:尽管强化学习(RL)研究取得了无可争议的进步,但仍然存在一些认知和架构方面的挑战。当前RL概念的主要挑战源于该理论定义状态的方式。鉴于实验室条件下的状态是易处理的(由于马尔可夫性质),而真实世界中的状态是高维,连续且部分可观察的。因此,如果针对每个状态正确识别了奖励相关维度的子集,则可以确保有效的学习和概括。此外,无模型和基于模型的RL方法之间的计算差异在如何指导交互式和竞争性多个系统的情况下如何指导最佳决策控制方面造成了稳定性-可塑性难题,​​每种系统都实现了不同类型的RL方法。通过显示人类受试者如何在逆向学习范式中灵活定义状态(与简单的RL模型相反)的行为结果,我们认为可以通过将RL框架作为人类行为的算法理论与吸引子框架的优势相结合来应对这些挑战在神经实施水平上。我们的立场得到了以下假设的支持:“吸引人状态”是大脑自我维持和回响活动的稳定模式,是大脑中神经元种群集体动力的体现。凭借其模式完成功能以及按时间顺序链接事件的功能,吸引器网络对噪声变得相对不敏感,从而可以考虑稀疏数据,而稀疏数据是高维和连续真实世界RL的特征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号