首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >A Parallel Framework of Adaptive Dynamic Programming Algorithm With Off-Policy Learning
【24h】

A Parallel Framework of Adaptive Dynamic Programming Algorithm With Off-Policy Learning

机译:脱南策略学习的自适应动态编程算法的平行框架

获取原文
获取原文并翻译 | 示例
           

摘要

In this article, a model-free online adaptive dynamic programming (ADP) approach is developed for solving the optimal control problem of nonaffine nonlinear systems. Combining the off-policy learning mechanism with the parallel paradigm, multithread agents are employed to collect the transitions by interacting with the environment that significantly augments the number of sampled data. On the other hand, each thread agent explores the environment with different initial states under its own behavior policy that enhances the exploration capability and alleviates the correlation between the sampled data. After the policy evaluation process, only one step update is required for policy improvement based on the policy gradient method. The stability of the system under iterative control laws is guaranteed. Moreover, the convergence analysis is given to prove that the iterative Q-function is monotonically nonincreasing and finally converges to the solution of the Hamilton-Jacobi-Bellman (HJB) equation. For implementing the algorithm, the actor-critic (AC) structure is utilized with two neural networks (NNs) to approximate the Q-function and the control policy. Finally, the effectiveness of the proposed algorithm is verified by two numerical examples.
机译:在本文中,开发了一种无模型的在线自适应动态编程(ADP)方法,用于解决非共发烟非线性系统的最佳控制问题。将缺处策略学习机制与并行范式相结合,使用多线程代理来通过与显着增强采样数据的数量的环境进行交互来收集过渡。另一方面,每个线程代理在其自己的行为策略下探讨了具有不同初始状态的环境,以增强探索能力并减轻采样数据之间的相关性。在策略评估过程之后,基于策略渐变方法,策略改进只需要一步更新。保证了系统下系统的稳定性得到了保障。此外,给出了收敛性分析,证明迭代Q函数单调是不释放的,最后收敛于Hamilton-jacobi-Bellman(HJB)方程的溶液。为了实现算法,演员 - 评论家(AC)结构用于两个神经网络(NNS),以近似Q函数和控制策略。最后,通过两个数值例子验证了所提出的算法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号