Goal Representation Heuristic Dynamic Programming on Maze Navigation

Ni Z.; He H.; Wen J.; Xu X.

首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >Goal Representation Heuristic Dynamic Programming on Maze Navigation

【24h】

Goal Representation Heuristic Dynamic Programming on Maze Navigation

机译：迷宫导航目标表示启发式动态规划

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Goal representation heuristic dynamic programming (GrHDP) is proposed in this paper to demonstrate online learning in the Markov decision process. In addition to the (external) reinforcement signal in literature, we develop an adaptively internal goal/reward representation for the agent with the proposed goal network. Specifically, we keep the actor-critic design in heuristic dynamic programming (HDP) and include a goal network to represent the internal goal signal, to further help the value function approximation. We evaluate our proposed GrHDP algorithm on two 2-D maze navigation problems, and later on one 3-D maze navigation problem. Compared to the traditional HDP approach, the learning performance of the agent is improved with our proposed GrHDP approach. In addition, we also include the learning performance with two other reinforcement learning algorithms, namely ${rm Sarsa}(lambda)$ and Q-learning, on the same benchmarks for comparison. Furthermore, in order to demonstrate the theoretical guarantee of our proposed method, we provide the characteristics analysis toward the convergence of weights in neural networks in our GrHDP approach.

机译：本文提出了目标表示启发式动态规划（GrHDP），以演示马尔可夫决策过程中的在线学习。除了文献中的（外部）强化信号外，我们还通过提出的目标网络为代理商开发了一种自适应的内部目标/奖励表示。具体来说，我们将参与者评论设计保留在启发式动态编程（HDP）中，并包括一个代表内部目标信号的目标网络，以进一步帮助价值函数逼近。我们在两个二维迷宫导航问题上评估了我们提出的GrHDP算法，然后在一个3-D迷宫导航问题上进行了评估。与传统的HDP方法相比，我们提出的GrHDP方法提高了代理的学习性能。此外，我们还将学习性能与其他两个强化学习算法（即$ {rm Sarsa}（lambda）$）和Q-learning纳入相同的基准进行比较。此外，为了证明我们提出的方法的理论保证，我们使用GrHDP方法对神经网络中的权重收敛进行了特征分析。

著录项

来源
《Neural Networks and Learning Systems, IEEE Transactions on》 |2013年第12期|2038-2050|共13页
作者
Ni Z.; He H.; Wen J.; Xu X.;
展开▼
作者单位

Department of Electrical, Computer and Biomedical Engineering, University of Rhode Island, Kingston, RI, USA|c|;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Adaptive dynamic programming; Markov decision process; goal representation heuristic dynamic programming; maze navigation/path planning; reinforcement learning;

机译：自适应动态规划;马尔可夫决策过程;目标表示启发式动态规划;迷宫导航/路径规划;强化学习;

相似文献

外文文献
中文文献
专利

1. A Theoretical Foundation of Goal Representation Heuristic Dynamic Programming [J] . Xiangnan Zhong, Zhen Ni, Haibo He Neural Networks and Learning Systems, IEEE Transactions on . 2016,第12期

机译：目标表示启发式动态规划的理论基础
2. Heuristic dynamic programming with internal goal representation [J] . Ni Z., He H. Soft computing: A fusion of foundations, methodologies and applications . 2013,第11期

机译：具有内部目标表示的启发式动态规划
3. OSCILLATORY AND HEMODYNAMIC CORRELATES DURING GOAL-DIRECTED NAVIGATION: A COMBINED EEG-FMRI STUDY USING A VIRTUAL T-MAZE TASK [J] . Gueth Malte, Mill Ravi, Cole Michael, Psychophysiology . 2019,第S1期

机译：振动和血液动力学在目标导向导航期间：使用虚拟T-Maze任务的组合EEG-FMRI研究
4. Supplementary damping control of VSC-HVDC for interarea oscillation using goal representation heuristic dynamic programming [C] . Yu Shen, Weibiao Chen, Wei Yao, International Conference on AC and DC Power Transmission . 2016

机译：使用目标表示启发式动态规划的VSC-HVDC补充阻尼控制区域间振荡
5. A Circuit-Level Model of Hippocampal, Entorhinal and Prefrontal Dynamics Underlying Rodent Maze Navigational Learning. [D] . Bray, Laurence C. Jayet. 2010

机译：啮齿类动物迷宫导航学习背后的海马，内脏和前额叶动力学的电路级模型。
6. Human Hippocampal and Parahippocampal Theta during Goal-Directed Spatial Navigation Predicts Performance on a Virtual Morris Water Maze [O] . Brian R. Cornwell, Linda L. Johnson, Tom Holroyd, 2008

机译：目标定向空间导航期间的人类海马和海马旁Theta预测虚拟莫里斯水迷宫的性能
7. Heuristic allocation based on a dynamic programming state-space representation [O] . Dragut, AB Andreea 2002

机译：基于动态编程状态空间表示的启发式分配

Goal Representation Heuristic Dynamic Programming on Maze Navigation

摘要

著录项

相似文献

相关主题

期刊订阅