首页> 外文会议>Asian Control Conference >Optimal tracking control for discrete-time systems by model-free off-policy Q-learning approach
【24h】

Optimal tracking control for discrete-time systems by model-free off-policy Q-learning approach

机译:无模型非策略Q学习方法对离散时间系统的最优跟踪控制

获取原文

摘要

In this paper, a novel off-policy Q-learning is developed for solving linear quadratic tracking (LQT) problem of discrete-time (DT) systems, using only the measured data along the system trajectories. How to learn the optimal tracking control policy by off-policy approach and prove no bias of optimal solution probably caused by adding a probing noise to guarantee persistent excitation are two challenging issues when designing off-policy Q-learning algorithm focused in this paper. To this end, a behavior policy is introduced, and a novel off-policy Q-function based iterative Bellman equation is derived in terms of the relationship between Q function and value function. Consequently, an off-policy Q-learning algorithm is developed and its convergence as well as no bias are proved. Simulation results are given to verify the effectiveness of the proposed method.
机译:在本文中,开发了一种新颖的非政策性Q学习,用于解决离散时间(DT)系统的线性二次跟踪(LQT)问题,仅使用沿系统轨迹的测量数据即可。在设计非策略Q学习算法时,如何通过非策略方法学习最优跟踪控制策略,并证明不存在可能由于添加探测噪声来保证持续激励而导致的最优解偏差,是两个具有挑战性的问题。为此,引入了一种行为策略,并根据Q函数与值函数之间的关系推导了一个基于新的基于策略的Q函数的Bellman迭代方程。因此,开发了一种非政策性的Q学习算法,并证明了其收敛性和无偏差性。仿真结果证明了该方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号