首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >Adaptive Learning in Tracking Control Based on the Dual Critic Network Design
【24h】

Adaptive Learning in Tracking Control Based on the Dual Critic Network Design

机译:基于双关键网络设计的跟踪控制中的自适应学习

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we present a new adaptive dynamic programming approach by integrating a reference network that provides an internal goal representation to help the systems learning and optimization. Specifically, we build the reference network on top of the critic network to form a dual critic network design that contains the detailed internal goal representation to help approximate the value function. This internal goal signal, working as the reinforcement signal for the critic network in our design, is adaptively generated by the reference network and can also be adjusted automatically. In this way, we provide an alternative choice rather than crafting the reinforcement signal manually from prior knowledge. In this paper, we adopt the online action-dependent heuristic dynamic programming (ADHDP) design and provide the detailed design of the dual critic network structure. Detailed Lyapunov stability analysis for our proposed approach is presented to support the proposed structure from a theoretical point of view. Furthermore, we also develop a virtual reality platform to demonstrate the real-time simulation of our approach under different disturbance situations. The overall adaptive learning performance has been tested on two tracking control benchmarks with a tracking filter. For comparative studies, we also present the tracking performance with the typical ADHDP, and the simulation results justify the improved performance with our approach.
机译:在本文中,我们通过集成一个提供内部目标表示的参考网络,来提供一种新的自适应动态规划方法,以帮助系统学习和优化。具体来说,我们在评论者网络之上构建参考网络,以形成双重评论者网络设计,其中包含详细的内部目标表示,以帮助近似值函数。内部目标信号在我们的设计中用作批评家网络的增强信号,是由参考网络自适应生成的,也可以自动调整。这样,我们提供了替代选择,而不是根据先验知识手动制作增强信号。在本文中,我们采用了基于在线动作的启发式动态编程(ADHDP)设计,并提供了双重评论者网络结构的详细设计。提出了针对我们提出的方法的详细Lyapunov稳定性分析,以从理论角度支持提出的结构。此外,我们还开发了一个虚拟现实平台,以演示在不同干扰情况下我们方法的实时仿真。总体自适应学习性能已通过两个带有跟踪过滤器的跟踪控制基准进行了测试。为了进行比较研究,我们还介绍了典型ADHDP的跟踪性能,仿真结果证明了采用我们的方法可以改善性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号