首页> 外文会议>Annual International Conference of the IEEE Engineering in Medicine and Biology Society >Reinforcement Learning based Decoding Using Internal Reward for Time Delayed Task in Brain Machine Interfaces
【24h】

Reinforcement Learning based Decoding Using Internal Reward for Time Delayed Task in Brain Machine Interfaces

机译:在脑机接口中使用针对内部任务的延时奖励的基于强化学习的解码

获取原文

摘要

Reinforcement learning (RL) algorithm interprets neural signals into movement intentions with the guidance of the reward in Brain-machine interfaces (BMIs). Current RL algorithms generally work for the tasks with immediate rewards delivery, and lack of efficiency in delayed reward task. Prefrontal cortex, including medial prefrontal cortex(mPFC), has been demonstrated to assign credit to intermediate steps, which reinforces preceding action more efficiently. In this paper, we propose to simulate the functionality of mPFC activities as intermediate rewards to train a RL based decoder in a two-step movement task. A support vector machine (SVM) is adopted to verify if the subject expects a reward due to the accomplishment of a subtask from mPFC activity. Then this discrimination result will be utilized to guide the training of the RL decoder for each step respectively. Here, we apply the Sarsa-style attention-gated reinforcement learning (SAGREL) as the decoder to interpret motor cortex(M1) activity to action states. We test on in vivo primary motor cortex (M1) and mPFC data collected from rats, where the rats need to first trigger the start and then press lever for rewards using M1 signals. SAGREL using intermediate rewards from mPFC activities achieves a prediction accuracy of 66.8% ± 2.0.% (mean ± std) %, which is significantly better than the one using the reward by the end of trial (45.9.% ± 1.2%). This reveals the potentials of modelling mPFC activities as intermediate rewards for the delayed reward tasks.
机译:强化学习(RL)算法在脑机接口(BMI)中的奖励指导下将神经信号解释为运动意图。当前的RL算法通常可用于具有立即奖励交付且延迟奖励任务效率低下的任务。前额叶皮层,包括内侧前额叶皮层(mPFC),已被证明可将功劳分配给中间步骤,从而更有效地增强了先前的动作。在本文中,我们建议将mPFC活动的功能作为中间奖励来模拟,以在两步运动任务中训练基于RL的解码器。采用支持向量机(SVM)来验证受试者是否由于完成mPFC活动的子任务而期望获得奖励。然后,该鉴别结果将被用来分别指导针对每个步骤的RL解码器的训练。在这里,我们将Sarsa风格的注意门控强化学习(SAGREL)用作解码器,以将运动皮质(M1)活动解释为动作状态。我们测试了从大鼠收集的体内初级运动皮层(M1)和mPFC数据,在这种情况下,大鼠需要首先触发开始,然后使用M1信号按杠杆以获得奖励。使用来自mPFC活动的中间奖励的SAGREL可以达到66.8%±2.0。%(平均值±标准差)%的预测准确度,这比使用试验结束时使用奖励的预测准确度(45.9。%±1.2%)要好得多。这揭示了将mPFC活动建模为延迟奖励任务的中间奖励的潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号