首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >Near-Optimal Controller for Nonlinear Continuous-Time Systems With Unknown Dynamics Using Policy Iteration
【24h】

Near-Optimal Controller for Nonlinear Continuous-Time Systems With Unknown Dynamics Using Policy Iteration

机译:带有策略迭代的动力学未知的非线性连续时间系统的近似最优控制器

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents a single-network adaptive critic-based controller for continuous-time systems with unknown dynamics in a policy iteration (PI) framework. It is assumed that the unknown dynamics can be estimated using the Takagi-Sugeno-Kang fuzzy model with arbitrary precision. The successful implementation of a PI scheme depends on the effective learning of critic network parameters. Network parameters must stabilize the system in each iteration in addition to approximating the critic and the cost. It is found that the critic updates according to the Hamilton-Jacobi-Bellman formulation sometimes lead to the instability of the closed-loop systems. In the proposed work, a novel critic network parameter update scheme is adopted, which not only approximates the critic at current iteration but also provides feasible solutions that keep the policy stable in the next step of training by combining a Lyapunov-based linear matrix inequalities approach with PI. The critic modeling technique presented here is the first of its kind to address this issue. Though multiple literature exists discussing the convergence of PI, however, to the best of our knowledge, there exists no literature, which focuses on the effect of critic network parameters on the convergence. Computational complexity in the proposed algorithm is reduced to the order of (Fz)n-1, where n is the fuzzy state dimensionality and Fz is the number of fuzzy zones in the states space. A genetic algorithm toolbox of MATLAB is used for searching stable parameters while minimizing the training error. The proposed algorithm also provides a way to solve for the initial stable control policy in the PI scheme. The algorithm is validated through real-time experiment on a commercial robotic manipulator. Results show that the algorithm successfully finds stable critic network parameters in real time for a highly nonlinear system.
机译:本文为策略迭代(PI)框架中具有未知动态的连续时间系统提供了一种基于单网络自适应批评者的控制器。假设可以使用Takagi-Sugeno-Kang模糊模型以任意精度估算未知动力学。 PI方案的成功实施取决于对批评者网络参数的有效学习。网络参数除了逼近评论家和成本之外,还必须在每次迭代中稳定系统。发现批评者根据汉密尔顿-雅各比-贝尔曼公式进行更新有时会导致闭环系统的不稳定。在提出的工作中,采用了一种新颖的批评者网络参数更新方案,该方案不仅可以在当前迭代中逼近评论者,而且可以通过结合基于Lyapunov的线性矩阵不等式方法提供可行的解决方案,从而在下一步的训练中保持策略的稳定性。与PI。此处介绍的评论家建模技术是解决此问题的首创。尽管有许多文献讨论了PI的收敛性,但是就我们所知,还没有文献关注评论者网络参数对收敛性的影响。该算法的计算复杂度降低到(Fz)n-1的数量级,其中n是模糊状态维数,Fz是状态空间中模糊区域的数量。 MATLAB的遗传算法工具箱用于在最小化训练误差的同时搜索稳定参数。所提出的算法还提供了一种解决PI方案中初始稳定控制策略的方法。该算法通过在商用机器人操纵器上的实时实验进行验证。结果表明,该算法可以成功地实时找到高度非线性系统的稳定批评者网络参数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号