首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >Goal Representation Heuristic Dynamic Programming on Maze Navigation
【24h】

Goal Representation Heuristic Dynamic Programming on Maze Navigation

机译:迷宫导航目标表示启发式动态规划

获取原文
获取原文并翻译 | 示例
       

摘要

Goal representation heuristic dynamic programming (GrHDP) is proposed in this paper to demonstrate online learning in the Markov decision process. In addition to the (external) reinforcement signal in literature, we develop an adaptively internal goal/reward representation for the agent with the proposed goal network. Specifically, we keep the actor-critic design in heuristic dynamic programming (HDP) and include a goal network to represent the internal goal signal, to further help the value function approximation. We evaluate our proposed GrHDP algorithm on two 2-D maze navigation problems, and later on one 3-D maze navigation problem. Compared to the traditional HDP approach, the learning performance of the agent is improved with our proposed GrHDP approach. In addition, we also include the learning performance with two other reinforcement learning algorithms, namely ${rm Sarsa}(lambda)$ and Q-learning, on the same benchmarks for comparison. Furthermore, in order to demonstrate the theoretical guarantee of our proposed method, we provide the characteristics analysis toward the convergence of weights in neural networks in our GrHDP approach.
机译:本文提出了目标表示启发式动态规划(GrHDP),以演示马尔可夫决策过程中的在线学习。除了文献中的(外部)强化信号外,我们还通过提出的目标网络为代理商开发了一种自适应的内部目标/奖励表示。具体来说,我们将参与者评论设计保留在启发式动态编程(HDP)中,并包括一个代表内部目标信号的目标网络,以进一步帮助价值函数逼近。我们在两个二维迷宫导航问题上评估了我们提出的GrHDP算法,然后在一个3-D迷宫导航问题上进行了评估。与传统的HDP方法相比,我们提出的GrHDP方法提高了代理的学习性能。此外,我们还将学习性能与其他两个强化学习算法(即$ {rm Sarsa}(lambda)$)和Q-learning纳入相同的基准进行比较。此外,为了证明我们提出的方法的理论保证,我们使用GrHDP方法对神经网络中的权重收敛进行了特征分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号