首页> 外国专利> SYSTEM AND METHOD FOR ROBUST OPTIMIZATION FOR TRAJECTORY-CENTRIC MODEL-BASED REINFORCEMENT LEARNING

SYSTEM AND METHOD FOR ROBUST OPTIMIZATION FOR TRAJECTORY-CENTRIC MODEL-BASED REINFORCEMENT LEARNING

机译：基于轨迹的基于模型的增强学习的鲁棒优化系统和方法

页面导航

摘要
著录项
相似文献

摘要

A controller for optimizing a local control policy of a system for trajectory-centric reinforcement learning is provided. The controller includes performing steps of learning a stochastic predictive model for the system using a set of data collected during trial and error experiments performed using an initial random control policy, estimating mean prediction and uncertainty associated, determining a local set of deviations of the system using the learned stochastic system model, from a nominal system state upon use of a control input at a current time-step, determining a system state with a worst-case deviation, determining a gradient of the robustness constraint, providing and solving a robust policy optimization problem using non-linear programming to obtain system trajectory and stabilizing local policy simultaneously, updating the control data according to the solved optimization problem, and output the updated control data via the interface.

机译：提供了一种用于优化用于轨迹以轨迹为中心的增强学习的局部控制策略的控制器。控制器包括使用在使用初始随机控制策略执行的试验和错误实验期间收集的一组数据来执行学习系统的随机预测模型的步骤，估计相关联的平均预测和不确定性，确定系统的本地偏差集学习的随机系统模型，从名义系统状态在使用当前时间步骤时使用控制输入，确定具有最坏情况偏差的系统状态，确定鲁棒性约束的梯度，提供和解决强大的策略优化使用非线性编程的问题来获得系统轨迹并同时稳定本地政策，根据所解决的优化问题更新控制数据，并通过接口输出更新的控制数据。

著录项

公开/公告号WO2021117845A1

专利类型
公开/公告日2021-06-17

原文格式PDF
申请/专利权人 MITSUBISHI ELECTRIC CORPORATION;
展开▼

申请/专利号WO2020JP46194
发明设计人 JHA DEVESH;KOLARIC PATRIK;RAGHUNATHAN ARVIND;BENOSMAN MOUHACINE;ROMERES DIEGO;
展开▼

申请日2020-12-04
分类号G06N3;G06N7;
国家 JP
入库时间 2022-08-24 19:27:00

相似文献

专利
外文文献
中文文献