首页> 外文会议>Asian Control Conference >Optimal tracking control for discrete-time systems by model-free off-policy Q-learning approach

【24h】

Optimal tracking control for discrete-time systems by model-free off-policy Q-learning approach

机译：无模型非策略Q学习方法对离散时间系统的最优跟踪控制

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, a novel off-policy Q-learning is developed for solving linear quadratic tracking (LQT) problem of discrete-time (DT) systems, using only the measured data along the system trajectories. How to learn the optimal tracking control policy by off-policy approach and prove no bias of optimal solution probably caused by adding a probing noise to guarantee persistent excitation are two challenging issues when designing off-policy Q-learning algorithm focused in this paper. To this end, a behavior policy is introduced, and a novel off-policy Q-function based iterative Bellman equation is derived in terms of the relationship between Q function and value function. Consequently, an off-policy Q-learning algorithm is developed and its convergence as well as no bias are proved. Simulation results are given to verify the effectiveness of the proposed method.

机译：在本文中，开发了一种新颖的非政策性Q学习，用于解决离散时间（DT）系统的线性二次跟踪（LQT）问题，仅使用沿系统轨迹的测量数据即可。在设计非策略Q学习算法时，如何通过非策略方法学习最优跟踪控制策略，并证明不存在可能由于添加探测噪声来保证持续激励而导致的最优解偏差，是两个具有挑战性的问题。为此，引入了一种行为策略，并根据Q函数与值函数之间的关系推导了一个基于新的基于策略的Q函数的Bellman迭代方程。因此，开发了一种非政策性的Q学习算法，并证明了其收敛性和无偏差性。仿真结果证明了该方法的有效性。

著录项

来源
《Asian Control Conference》|2017年|7-12|共6页
会议地点
作者
Jinna Li; Decheng Yuan; Zhengtao Ding;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Algorithm design and analysis; Heuristic algorithms; Dynamic programming; Approximation algorithms; Target tracking; Trajectory; Mathematical model;

机译：算法设计与分析;启发式算法;动态规划;逼近算法;目标跟踪;轨迹;数学模型;

相似文献

外文文献
中文文献
专利

1. Stochastic linear quadratic optimal control for model-free discrete-time systems based on Q-learning algorithm [J] . Wang Tao, Zhang Huaguang, Luo Yanhong Neurocomputing . 2018,第OCTa27期

机译：基于Q学习算法的无模型离散时间系统的随机线性二次最优控制。
2. Optimal tracking control for non-zero-sum games of linear discrete-time systems via off-policy reinforcement learning [J] . Optimal Control Applications and Methods . 2020,第4期

机译：通过截止策略强化学习对线性离散时间系统非零和游戏的最佳跟踪控制
3. Adaptive optimal output feedback tracking control for unknown discrete-time linear systems using a combined reinforcement Q-learning and internal model method [J] . Control Theory & Applications, IET . 2019,第18期

机译：结合强化Q学习和内部模型方法的未知离散时间线性系统的自适应最优输出反馈跟踪控制
4. Optimal tracking control for discrete-time systems by model-free off-policy Q-learning approach [C] . Jinna Li, Decheng Yuan, Zhengtao Ding Asian Control Conference . 2017

机译：通过模型脱离政策Q学习方法对离散时间系统的最佳跟踪控制
5. Optimal tracking control of uncertain systems: On-policy and off-policy reinforcement learning approaches [D] . Modares, Hamidreza 2015

机译：不确定系统的最优跟踪控制：基于策略和基于策略的强化学习方法
6. A new epidemic modeling approach: Multi-regions discrete-time model with travel-blocking vicinity optimal control strategy [O] . Omar Zakary, Mostafa Rachik, Ilias Elmouki 2017

机译：一种新的流行病建模方法：具有旅行阻塞的最优控制策略的多区域离散时间模型
7. Output Feedback H∞ Control for Linear Discrete-Time Multi-Player Systems With Multi-Source Disturbances Using Off-Policy Q-Learning [O] . Zhenfei Xiao, Jinna Li, Ping Li 2020

机译：输出反馈H∞控制线性离散时间多人多人系统，使用脱离策略Q-Learning具有多源干扰
8. Model-Free Approach to Optimal Signal Light Timing for System-Wide TrafficControl [R] . Spall, J. C., Chin, D. C. 1994

机译：全系统流量控制的最优信号光定时模型

Optimal tracking control for discrete-time systems by model-free off-policy Q-learning approach

摘要

著录项

相似文献

相关主题

期刊订阅