Integration of imitation learning using GAIL and reinforcement learning using task-achievement rewards via probabilistic graphical model

Kinose Akira; Taniguchi Tadahiro

首页> 外文期刊>Advanced Robotics: The International Journal of the Robotics Society of Japan >Integration of imitation learning using GAIL and reinforcement learning using task-achievement rewards via probabilistic graphical model

【24h】

Integration of imitation learning using GAIL and reinforcement learning using task-achievement rewards via probabilistic graphical model

机译：通过概率图形模型使用任务成就奖励使用盖尔和强化学习的模仿学习

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The integration of reinforcement learning (RL) and imitation learning (IL) is an important problem that has long been studied in the field of intelligent robotics. RL optimizes policies to maximize the cumulative reward, whereas IL attempts to extract general knowledge about the trajectories demonstrated by experts, i.e, demonstrators. Because each has its own drawbacks, many methods combining them and compensating for each set of drawbacks have been explored thus far. However, many of these methods are heuristic and do not have a solid theoretical basis. This paper presents a new theory for integrating RL and IL by extending the probabilistic graphical model (PGM) framework for RL,control as inference. We develop a new PGM for RL with multiple types of rewards, called probabilistic graphical model for Markov decision processes with multiple optimality emissions (pMDP-MO). Furthermore, we demonstrate that the integrated learning method of RL and IL can be formulated as a probabilistic inference of policies on pMDP-MO by considering the discriminator in generative adversarial imitation learning (GAIL) as an additional optimality emission. We adapt the GAIL and task-achievement reward to our proposed framework, achieving significantly better performance than policies trained with baseline methods.

机译：钢筋学习（RL）和仿制学习（IL）的整合是在智能机器人领域中已经研究过的重要问题。 RL优化了最大化累计奖励的政策，而IL试图提取关于专家演示的轨迹的一般知识，即示威者。因为每个人具有自己的缺点，因此迄今为止已经探讨了许多组合它们和补偿每组缺点的方法。然而，许多这些方法是启发式的并且没有稳定的理论基础。本文介绍了通过扩展RL的概率图形模型（PGM）框架来集成R1和IL的新理论，控制为推理。我们为RL开发了新的PGM，具有多种类型的奖励，称为Markov决策过程的概率图形模型，具有多种最优排放（PMDP-Mo）。此外，我们证明通过考虑生成的对抗性模仿学习（GAIL）作为额外的最优性发射，将R1和IL的综合学习方法作为PMDP-MO对PMDP-MO政策的概率推断。我们调整盖尔和任务成就奖励对我们提出的框架，实现比具有基线方法培训的政策更好的性能。

著录项

来源
《Advanced Robotics: The International Journal of the Robotics Society of Japan》 |2020年第16期|共13页
作者
Kinose Akira; Taniguchi Tadahiro;
展开▼
作者单位

Ritsumeikan Univ Dept Human &

Comp Intelligence Kusatsu Japan;

Ritsumeikan Univ Dept Human &

Comp Intelligence Kusatsu Japan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类机器人技术;
关键词
Imitation learning; reinforcement learning; probabilistic inference; control as inference; generative adversarial imitation learning;

机译：模仿学习;加强学习;概率推断;控制为推理;生成的对抗仿制学习;

相似文献

外文文献
中文文献
专利

1. Integration of imitation learning using GAIL and reinforcement learning using task-achievement rewards via probabilistic graphical model [J] . Kinose Akira, Taniguchi Tadahiro Advanced Robotics: The International Journal of the Robotics Society of Japan . 2020,第15a16期

机译：通过概率图形模型使用任务成就奖励使用盖尔和强化学习的模仿学习
2. The actions of others act as a pseudo-reward to drive imitation in the context of social reinforcement learning [J] . Anis Najar, Emmanuelle Bonnet, Bahador Bahrami, PLoS Biology . 2020,第12期

机译：他人的行为充当伪奖励，以便在社会强化学习背景下驾驶模仿
3. The "Proactive" Model of Learning: Integrative Framework for Model-Free and Model-Based Reinforcement Learning Utilizing the Associative Learning-Based Proactive Brain Concept [J] . Zsuga Judit, Biro Klara, Papp Csaba, Behavioral neuroscience . 2016,第1期

机译：“主动”学习模型：利用基于联合学习的主动脑概念进行无模型和基于模型的强化学习的集成框架
4. Effective Integration of Imitation Learning and Reinforcement Learning by Generating Internal Reward [C] . Hamahata Keita, Taniguchi Tadahiro, Sakakibara Kazutoshi, International Conference on Intelligent Systems Design and Applications . 2008

机译：生成内部奖励的仿制学习和加强学习的有效整合
5. Learning Policies for Model-Based Reinforcement Learning Using Distributed Reward Formulation [D] . Agarwal, Nikhil. 2021

机译：使用分布式奖励制定学习基于模型的强化学习的政策
6. An extended reinforcement learning model of basal ganglia to understand the contributions of serotonin and dopamine in risk-based decision making reward prediction and punishment learning [O] . Pragathi P. Balasubramani, V. Srinivasa Chakravarthy, Balaraman Ravindran, 2014

机译：扩展的基底神经节强化学习模型以了解5-羟色胺和多巴胺在基于风险的决策奖励预测和惩罚学习中的作用
7. Integration of imitation learning using GAIL and reinforcement learning using task-achievement rewards via probabilistic graphical model [O] . Akira Kinose, Tadahiro Taniguchi 2020

机译：概率图形模型使用盖爪和加固学习的仿制学习的集成
8. Learning from Noisy and Delayed Rewards: The Value of Reinforcement Learning to Defense Modeling and Simulation. [R] . Alt, J. K. 2012

机译：学习嘈杂和延迟奖励：强化学习对国防建模和仿真的价值。

Integration of imitation learning using GAIL and reinforcement learning using task-achievement rewards via probabilistic graphical model

摘要

著录项

相似文献

相关主题

期刊订阅