Demonstration Guided Actor-Critic Deep Reinforcement Learning for Fast Teaching of Robots in Dynamic Environments

Liang Gong; Te Sun; Xudong Li; Ke Lin; Natalia Díaz-Rodríguez; David Filliat; Zhengfeng Zhang; Junping Zhang

首页> 外文期刊>IFAC PapersOnLine >Demonstration Guided Actor-Critic Deep Reinforcement Learning for Fast Teaching of Robots in Dynamic Environments

【24h】

Demonstration Guided Actor-Critic Deep Reinforcement Learning for Fast Teaching of Robots in Dynamic Environments

机译：示范引导演员 - 在动态环境中快速教学的深度加固学习

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Using direct reinforcement learning (RL) to accomplish a task can be very inefficient, especially in robotic configurations where interactions with the environment are lengthy and costly. Instead, learning from expert demonstration (LfD) is an alternative approach to gain better performance in an RL setting, which also greatly improves sample efficiency. We propose a novel demonstration learning framework for actor-critic based algorithms. Firstly, we put forward an environment pre-training paradigm to initialize the model parameters without interacting with the target environment, which effectively avoids the cold start problem in deep RL scenarios. Secondly, we design a general-purpose LfD framework for most of the mainstream actor-critic RL algorithms that include a policy network and a value function like PPO, SAC, TRPO, A3C. Thirdly, we build a dedicated model training platform to perform the human-robot interaction and numerical experimentation. We evaluate the method in six Mujoco simulated locomotion environments and our robot control simulation platform. Results show that several epochs of pre-training can improve the agent’s performance over the early stage of training. Also, the final converged performance of the RL algorithm is also boosted by external demonstration. In general the sample efficiency is improved by 30% with the proposed method. Our demonstration pipeline makes full use of the exploration property of the RL algorithm, which is feasible for fast teaching robots in dynamic environments.

机译：使用直接加强学习（RL）来完成任务可以是非常低效的，尤其是在与环境交互的机器人配置中冗长且昂贵。相反，从专家演示（LFD）的学习是一种替代方法，可以在RL设置中获得更好的性能，这也大大提高了样本效率。我们为基于演员批评的算法提出了一种新颖的演示学习框架。首先，我们提出了一个环境预训练范例来初始化模型参数而不与目标环境进行交互，从而有效地避免了深度RL方案中的冷启动问题。其次，我们为大多数主流演员 - 评论家RL算法设计了一个通用的LFD框架，包括策略网络和PPO，SAC，TRPO，A3C等价值函数。第三，我们建立专用模型培训平台，以执行人机交互和数值实验。我们评估六个Mujoco模拟运动环境和机器人控制仿真平台的方法。结果表明，几个训练时期可以在培训的早期阶段提高代理商的表现。此外，RL算法的最终收敛性能也被外部演示提升。通常，采用所提出的方法，样品效率提高了30％。我们的演示管道充分利用了RL算法的探索性，这对于动态环境中的快速教学机器人来说是可行的。

著录项

来源
《IFAC PapersOnLine》 |2020年第5期|共8页
作者
Liang Gong; Te Sun; Xudong Li; Ke Lin; Natalia Díaz-Rodríguez; David Filliat; Zhengfeng Zhang; Junping Zhang;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
deep learningdeep reinforcement learninglearning from demonstration (LfD)actor-critic frameworkrobotics;

机译：来自示范（LFD）演员的深度学习加强学习;

相似文献

外文文献
中文文献
专利

1. Optical Coherence Tomography-Guided Robotic Ophthalmic Microsurgery via Reinforcement Learning from Demonstration [J] . Keller Brenton, Draelos Mark, Zhou Kevin, IEEE Transactions on Robotics . 2020,第4期

机译：光学相干断层扫描引导机器人眼科显微镜通过钢筋学习从示范
2. A Deep Actor-Critic Reinforcement Learning Framework for Dynamic Multichannel Access [J] . Zhong Chen, Lu Ziyang, Gursoy M. Cenk, IEEE Transactions on Cognitive Communications and Networking . 2019,第4期

机译：动态多通道访问的深层演员批评素描框架
3. Open Source Robotic Simulators Platforms for Teaching Deep Reinforcement Learning Algorithms [J] . Armando Plasencia, Yulia Shichkina, Ileana Suárez, Procedia Computer Science . 2019,第5期

机译：开源机器人模拟器平台，用于教学深层加固学习算法
4. Self-Guided Actor-Critic: Reinforcement Learning from Adaptive Expert Demonstrations [C] . Haoran Zhang, Chenkun Yin, Yanxin Zhang, IEEE Conference on Decision and Control . 2020

机译：自我指导演员 - 评论家：自适应专家演示中的加固学习
5. Deep Reinforcement Learning with Accelerated Reward Function Technique for Robotics Task Planning [D] . Shaikh, Shifa. 2021

机译：机器人任务规划加速奖励功能技术的深增强学习
6. Double Deep Q-Learning and Faster R-CNN-Based Autonomous Vehicle Navigation and Obstacle Avoidance in Dynamic Environment [O] . Razin Bin Issa, Modhumonty Das, Md. Saferi Rahman, 2021

机译：双层Q-Learning和更快的R-CNN自主车辆导航和动态环境中的避难
7. Demonstration-Guided Deep Reinforcement Learning of Control Policies for Dexterous Human-Robot Interaction [O] . Sammy Christen, Stefan Stevsic, Otmar Hilliges 2019

机译：用于控制人体机器人相互作用的控制政策的示范引导的深度增强学习

Demonstration Guided Actor-Critic Deep Reinforcement Learning for Fast Teaching of Robots in Dynamic Environments

摘要

著录项

相似文献

相关主题

期刊订阅