Incremental Reinforcement Learning in Continuous Spaces via Policy Relaxation and Importance Weighting

Wang Zhi; Li Han-Xiong; Chen Chunlin

首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >Incremental Reinforcement Learning in Continuous Spaces via Policy Relaxation and Importance Weighting

【24h】

Incremental Reinforcement Learning in Continuous Spaces via Policy Relaxation and Importance Weighting

机译：通过政策放松和重要加权在连续空间中增量加强学习

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, a systematic incremental learning method is presented for reinforcement learning in continuous spaces where the learning environment is dynamic. The goal is to adjust the previously learned policy in the original environment to a new one incrementally whenever the environment changes. To improve the adaptability to the ever-changing environment, we propose a two-step solution incorporated with the incremental learning procedure: policy relaxation and importance weighting. First, the behavior policy is relaxed to a random one in the initial learning episodes to encourage a proper exploration in the new environment. It alleviates the conflict between the new information and the existing knowledge for a better adaptation in the long term. Second, it is observed that episodes receiving higher returns are more in line with the new environment, and hence contain more new information. During parameter updating, we assign higher importance weights to the learning episodes that contain more new information, thus encouraging the previous optimal policy to be faster adapted to a new one that fits in the new environment. Empirical studies on continuous controlling tasks with varying configurations verify that the proposed method achieves a significantly faster adaptation to various dynamic environments than the baselines.

机译：在本文中，提出了一种用于在学习环境是动态的连续空间中的加强学习的系统增量学习方法。目标是在环境变化时逐步调整原始环境中的先前学习的策略到新的策略。为了提高对不断变化的环境的适应性，我们提出了一种与增量学习程序的两步解决方案：政策放松和重量。首先，行为政策在初始学习剧集中放宽一个随机的一个，以鼓励在新环境中进行适当的探索。它减轻了新信息与现有知识之间的冲突，以便长期更好地适应。其次，观察到，接收更高返回的剧集更加符合新环境，因此包含更多的新信息。在参数更新期间，我们为包含更多新信息的学习剧集分配更高的重量，从而鼓励以前的最佳政策更快地适应适合新环境的新型。与不同的配置连续控制任务的实证研究验证，该方法实现了显著更快地适应各种动态环境比基线。

著录项

来源
《Neural Networks and Learning Systems, IEEE Transactions on》 |2020年第6期|1870-1883|共14页
作者
Wang Zhi; Li Han-Xiong; Chen Chunlin;
展开▼
作者单位

Nanjing Univ Dept Control & Syst Engn Sch Management & Engn Nanjing 210093 Peoples R China;

City Univ Hong Kong Dept Syst Engn & Engn Management Hong Kong Peoples R China|Cent South Univ State Key Lab High Performance Complex Mfg Changsha 410083 Peoples R China;

Nanjing Univ Dept Control & Syst Engn Sch Management & Engn Nanjing 210093 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Task analysis; Learning systems; Heuristic algorithms; Function approximation; Robots; Navigation; Neural networks; Continuous spaces; dynamic environments; importance weighting; incremental reinforcement learning (RL); policy relaxation;

机译：任务分析;学习系统;启发式算法;功能近似;机器人;导航;神经网络;连续空间;动态环境;重要性加权;增量加强学习（RL）;政策放松;政策放松;

相似文献

外文文献
中文文献
专利

1. Efficient reinforcement learning in continuous state and action spaces with Dyna and policy approximation [J] . Zhong Shan, Liu Quan, Zhang Zongzhang, Frontiers of computer science in China . 2019,第1期

机译：使用Dyna和策略逼近在连续状态和动作空间中进行有效的强化学习
2. Policy derivation methods for critic-only reinforcement learning in continuous spaces [J] . Eduard Alibekov, Jiří Kubalík, Robert Babuška Engineering Applications of Artificial Intelligence . 2018,第MARa期

机译：连续空间中仅限批评家的强化学习的策略推导方法
3. Reinforcement Learning with Particle Swarm Optimization Policy (PSO-P) in Continuous State and Action Spaces [J] . Daniel Hein, Alexander Hentschel, Thomas A. Runkler, International journal of swarm intelligence research . 2016,第3期

机译：在连续状态和动作空间中使用粒子群优化策略（PSO-P）进行强化学习
4. Policy Derivation Methods for Critic-Only Reinforcement Learning in Continuous Action Spaces [C] . Eduard Alibekov, Jiri Kubalik, Robert Babuska IFAC Conference on Intelligent Control and Automation Sciences . 2016

机译：在连续行动空间中批评的批评加强学习的政策推导方法
5. Learning control policies from demonstration in continuous sensory and action space. [D] . McLeod, Adam M. 2015

机译：通过在连续的感官和动作空间中的演示来学习控制策略。
6. Correction: Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail [O] . Eleni Vasilaki, Nicolas Frémaux, Robert Urbanczik, 2009

机译：更正：在连续状态和动作空间中基于峰值的强化学习：当策略梯度方法失败时
7. Policy iterations for reinforcement learning problems in continuous time and space — Fundamental theory and methods [O] . Jaeyoung Lee, Richard S. Sutton 2021

机译：连续时间和空间中加强学习问题的政策迭代 - 基础理论与方法

Incremental Reinforcement Learning in Continuous Spaces via Policy Relaxation and Importance Weighting

摘要

著录项

相似文献

相关主题

期刊订阅