Near Optimal Output Feedback Control of Nonlinear Discrete-time Systems Based on Reinforcement Neural Network Learning

Qiming Zhao; Hao Xu; Sarangapani Jagannathan

首页> 外文期刊>自动化学报：英文版 >Near Optimal Output Feedback Control of Nonlinear Discrete-time Systems Based on Reinforcement Neural Network Learning

【24h】

Near Optimal Output Feedback Control of Nonlinear Discrete-time Systems Based on Reinforcement Neural Network Learning

机译：基于强化神经网络学习的非线性离散系统的近最优输出反馈控制

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

In this paper, the output feedback based finitehorizon near optimal regulation of nonlinear affine discretetime systems with unknown system dynamics is considered by using neural networks(NNs) to approximate Hamilton-JacobiBellman(HJB) equation solution. First, a NN-based Luenberger observer is proposed to reconstruct both the system states and the control coefficient matrix. Next, reinforcement learning methodology with actor-critic structure is utilized to approximate the time-varying solution, referred to as the value function, of the HJB equation by using a NN. To properly satisfy the terminal constraint, a new error term is defined and incorporated in the NN update law so that the terminal constraint error is also minimized over time. The NN with constant weights and timedependent activation function is employed to approximate the time-varying value function which is subsequently utilized to generate the finite-horizon near optimal control policy due to NN reconstruction errors. The proposed scheme functions in a forward-in-time manner without offline training phase. Lyapunov analysis is used to investigate the stability of the overall closedloop system. Simulation results are given to show the effectiveness and feasibility of the proposed method.

机译：本文利用神经网络（NNs）近似Hamilton-JacobiBellman（HJB）方程解，考虑了基于输出反馈的非线性仿射离散时间系统的基于最优反馈的接近最优调节。首先，提出了一种基于NN的Luenberger观测器，以重构系统状态和控制系数矩阵。接下来，利用具有行为者批判结构的强化学习方法，通过使用NN近似HJB方程的时变解，称为值函数。为了适当地满足终端约束条件，定义了一个新的误差项并将其纳入NN更新定律，以使终端约束误差也随着时间的推移而最小化。具有恒定权重和时变激活函数的NN用于近似时变值函数，该函数随后由于NN重构误差而用于生成有限水平的接近最优控制策略。所提出的方案以及时的方式起作用，而没有离线训练阶段。 Lyapunov分析用于研究整个闭环系统的稳定性。仿真结果表明了该方法的有效性和可行性。

著录项

来源
《自动化学报：英文版》 |2014年第004期|P.372-384|共13页
作者
Qiming Zhao; Hao Xu; Sarangapani Jagannathan;
展开▼
作者单位

the DENSO International America, Inc.;

with the College of Science and Engineering, Texas A&M University;

the Department of Electrical & Computer Engineering, Missouri University of Science and Technology;

展开▼
收录信息
原文格式 PDF
正文语种 CHI
中图分类
关键词

Near Optimal Output Feedback Control of Nonlinear Discrete-time Systems Based on Reinforcement Neural Network Learning

摘要

著录项

相关主题

期刊订阅