An Equivalence Between Adaptive Dynamic Programming With a Critic and Backpropagation Through Time

Fairbank M.; Alonso E.; Prokhorov D.

首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >An Equivalence Between Adaptive Dynamic Programming With a Critic and Backpropagation Through Time

【24h】

An Equivalence Between Adaptive Dynamic Programming With a Critic and Backpropagation Through Time

机译：具有批判的自适应动态规划与反向传播之间的等价关系

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We consider the adaptive dynamic programming technique called Dual Heuristic Programming (DHP), which is designed to learn a critic function, when using learned model functions of the environment. DHP is designed for optimizing control problems in large and continuous state spaces. We extend DHP into a new algorithm that we call Value-Gradient Learning, ${rm VGL}(lambda)$, and prove equivalence of an instance of the new algorithm to Backpropagation Through Time for Control with a greedy policy. Not only does this equivalence provide a link between these two different approaches, but it also enables our variant of DHP to have guaranteed convergence, under certain smoothness conditions and a greedy policy, when using a general smooth nonlinear function approximator for the critic. We consider several experimental scenarios including some that prove divergence of DHP under a greedy policy, which contrasts against our proven-convergent algorithm.

机译：我们考虑一种称为双重启发式编程（DHP）的自适应动态编程技术，该技术旨在在使用学习到的环境模型函数时学习评论函数。 DHP旨在优化大型和连续状态空间中的控制问题。我们将DHP扩展到称为“价值梯度学习”的新算法$ {rm VGL}（lambda）$，并用贪婪策略证明该新算法的实例与“时间控制传播”等效。这种等效性不仅提供了这两种不同方法之间的联系，而且还使我们的DHP变体在使用某些平滑的非线性函数逼近器进行批判时，可以在一定的平滑度条件和贪婪策略下确保收敛。我们考虑了几种实验方案，其中包括一些在贪婪策略下证明DHP有分歧的方案，这与我们的经过证明的收敛算法形成了对比。

著录项

来源
《Neural Networks and Learning Systems, IEEE Transactions on》 |2013年第12期|2088-2100|共13页
作者
Fairbank M.; Alonso E.; Prokhorov D.;
展开▼
作者单位

Department of Computer Science, School of Informatics, City University London, London, U.K.|c|;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Adaptive dynamic programming (ADP); backpropagation through time; dual heuristic programming (DHP); neural networks; value-gradient learning;

机译：自适应动态规划（ADP）;时间反向传播;双重启发式编程（DHP）;神经网络;价值梯度学习;

相似文献

外文文献
中文文献
专利

1. Critic-only adaptive dynamic programming algorithms' applications to the secure control of cyber-physical systems [J] . Jiang He, Zhang Huaguang, Xie Xiangpeng ISA Transactions . 2020,第1期

机译：仅限评估的自适应动态编程算法应用于网络物理系统的安全控制
2. Decentralized robust optimal control for modular robot manipulators via critic-identifier structure-based adaptive dynamic programming [J] . Neural computing & applications . 2020,第8期

机译：通过批评标识符结构的自适应动态编程模块化机器人操纵器的分散鲁棒优化控制
3. Observer-critic structure-based adaptive dynamic programming for decentralised tracking control of unknown large-scale nonlinear systems [J] . Zhao Bo, Liu Derong, Yang Xiong, International journal of systems science . 2017,第9a12期

机译：基于观察者-批评结构的自适应动态规划，用于未知大规模非线性系统的分散跟踪控制
4. Adaptive Optimal Tracking Control for Continuous-Time Systems Using Identifier-Critic Based Dynamic Programming [C] . Dawei Hou, Jing Na, Yongfeng Lv, Chinese Control Conference . 2017

机译：基于标识符 - 批评的动态编程的连续时间系统自适应最佳跟踪控制
5. Approximate dynamic programming solutions with a single network adaptive critic for a class of nonlinear systems. [D] . Ding, Jie. 2011

机译：具有一类非线性系统的单个网络自适应注释器的近似动态规划解决方案。
6. Neural Dynamics of Autistic Repetitive Behaviors and Fragile X Syndrome: Basal Ganglia Movement Gating and mGluR-Modulated Adaptively Timed Learning [O] . Stephen Grossberg, Devika Kishnan -1

机译：自闭症重复行为和脆弱X综合征的神经动力学：基底神经节运动门控和mGluR调制的自适应定时学习
7. An Equivalence Between Adaptive Dynamic Programming With a Critic and Backpropagation Through Time [O] . Fairbank M, Alonso E, Prokhorov D 2013

机译：具有批判的自适应动态规划与反向传播之间的等价关系
8. I. Criterion Equivalence in Discrete Dynamic Programming. II. Stochastic Games with Perfect Information and Time Average Payoff [R] . Lippman, S. A., Liggett, T. M. 1968

机译：I.离散动态规划中的判据等价。 II。具有完美信息和时间平均收益的随机游戏

An Equivalence Between Adaptive Dynamic Programming With a Critic and Backpropagation Through Time

摘要

著录项

相似文献

相关主题

期刊订阅