Incremental Reinforcement Learning With Prioritized Sweeping for Dynamic Environments

Wang Zhi; Chen Chunlin; Li Han-Xiong; Dong Daoyi; Tarn Tzyh-Jong

首页> 外文期刊>Mechatronics, IEEE/ASME Transactions on >Incremental Reinforcement Learning With Prioritized Sweeping for Dynamic Environments

【24h】

Incremental Reinforcement Learning With Prioritized Sweeping for Dynamic Environments

机译：用于动态环境优先扫描的增量强化学习

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, a novel incremental learning algorithm is presented for reinforcement learning (RL) in dynamic environments, where the rewards of state-action pairs may change over time. The proposed incremental RL (IRL) algorithm learns from the dynamic environments without making any assumptions or having any prior knowledge about the ever-changing environment. First, IRL generates a detector-agent to detect the changed part of the environment (drift environment) by executing a virtual RL process. Then, the agent gives priority to the drift environment and its neighbor environment for iteratively updating their state-action value functions using new rewards by dynamic programming. After the prioritized sweeping process, IRL restarts a canonical learning process to obtain a new optimal policy adapting to the new environment. The novelty is that IRL fuses the new information into the existing knowledge system incrementally as well as weakening the conflict between them. The IRL algorithm is compared to two direct approaches and various state-of-the-art transfer learning methods for classical maze navigation problems and an intelligent warehouse with multiple robots. The experimental results verify that IRL can effectively improve the adaptability and efficiency of RL algorithms in dynamic environments.

机译：在本文中，呈现了一种新的增量学习算法，用于在动态环境中的增强学习（RL），其中状态操作对的奖励可能随时间改变。所提出的增量RL（IRL）算法从动态环境中学习，而不会使任何假设或有任何关于不断变化的环境的先验知识。首先，iRR生成检测器代理通过执行虚拟RL处理来检测环境（漂移环境）的改变部分。然后，代理优先考虑漂移环境及其邻居环境，用于使用动态编程使用新奖励来迭代地更新其状态动作值函数。在优先顺序的Sweeping进程之后，IRL重新启动规范学习过程，以获得适应新环境的新的最佳政策。新颖性是IRL逐步使新信息融入现有的知识系统，以及削弱它们之间的冲突。将IRL算法与两种直接方法和各种最先进的传输学习方法进行比较，用于古典迷宫导航问题和具有多个机器人的智能仓库。实验结果验证了IRL可以有效提高动态环境中R1算法的适应性和效率。

著录项

来源
《Mechatronics, IEEE/ASME Transactions on》 |2019年第2期|621-632|共12页
作者
Wang Zhi; Chen Chunlin; Li Han-Xiong; Dong Daoyi; Tarn Tzyh-Jong;
展开▼
作者单位

Nanjing Univ Dept Control & Syst Engn Nanjing 210093 Jiangsu Peoples R China|City Univ Hong Kong Dept Syst Engn & Engn Management Hong Kong Peoples R China;

Nanjing Univ Dept Control & Syst Engn Nanjing 210093 Jiangsu Peoples R China;

City Univ Hong Kong Dept Syst Engn & Engn Management Hong Kong Peoples R China|Cent S Univ State Key Lab High Performance Complex Mfg Changsha 410083 Hunan Peoples R China;

Univ New South Wales Sch Engn & Informat Technol Canberra ACT 2600 Australia;

Washington Univ Dept Elect & Syst Engn St Louis MO 63130 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Dynamic environments; environment drift; incremental reinforcement learning (IRL); intelligent ware-houses; prioritized sweeping;

机译：动态环境;环境漂移;增量强化学习（IRL）;智能储备;优先扫描;

相似文献

外文文献
中文文献
专利

1. Incremental Reinforcement Learning With Prioritized Sweeping for Dynamic Environments [J] . Wang Zhi, Chen Chunlin, Li Han-Xiong, Mechatronics, IEEE/ASME Transactions on . 2019,第2期

机译：动态环境中优先扫描的增量强化学习
2. Epoch-incremental Dyna-learning and prioritized sweeping algorithms [J] . Zajdel Roman Neurocomputing . 2018,第NOVa30期

机译：历时增量动态学习和优先扫描算法
3. Energy Management for a Hybrid Electric Vehicle Based on Blended Reinforcement Learning With Backward Focusing and Prioritized Sweeping [J] . Yang Ningkang, Han Lijin, Xiang Changle, IEEE Transactions on Vehicular Technology . 2021,第4期

机译：基于混合加固学习的混合动力电动汽车的能源管理，落后聚焦和优先扫描
4. Morphing Strategy Design for UAV based on Prioritized Sweeping Reinforcement Learning [C] . Ruizhi Li, Qing Wang, Yu’ang Liu, Annual Conference of the IEEE Industrial Electronics Society . 2020

机译：基于优先扫描强化学习的无人机变形策略设计
5. A study of interconnected dynamical systems and reinforcement learning in a multi-agent and distributed environment. [D] . Madera, Manuel. 2012

机译：在多主体和分布式环境中研究相互联系的动力系统和强化学习。
6. Navigation in Unknown Dynamic Environments Based on Deep Reinforcement Learning [O] . Junjie Zeng, Rusheng Ju, Long Qin, 2019

机译：基于深度强化学习的未知动态环境中的导航
7. Prioritized sweeping: Reinforcement learning with less data and less time [O] . Andrew W. Moore, Christopher G. Atkeson 1993

机译：优先清扫：用更少的数据和更少的时间进行强化学习

Incremental Reinforcement Learning With Prioritized Sweeping for Dynamic Environments

摘要

著录项

相似文献

相关主题

期刊订阅