Prediction of Reward Functions for Deep Reinforcement Learning via Gaussian Process Regression

Lim Jaehyun; Ha Seungchul; Choi Jongeun

首页> 外文期刊>Mechatronics, IEEE/ASME Transactions on >Prediction of Reward Functions for Deep Reinforcement Learning via Gaussian Process Regression

【24h】

Prediction of Reward Functions for Deep Reinforcement Learning via Gaussian Process Regression

机译：高斯过程回归深增强学习的奖励功能预测

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Inverse reinforcement learning (IRL) is a technique for automatic reward acquisition, however, it is difficult to apply to high-dimensional problems with unknown dynamics. This article proposes an efficient way to solve the IRL problem based on the sparse Gaussian process (GP) prediction with l(1)-regularization only using a highly limited number of expert demonstrations. A GP model is proposed to be trained to predict a reward function using trajectory-reward pair data generated by deep reinforcement learning with different reward functions. The trained GP successfully predicts the reward functions of human experts from their collected demonstration trajectory datasets. To demonstrate our approach, the proposed approach is applied to the obstacle avoidance navigation of the mobile robot. The experimental results clearly show that the robots can clone the experts' optimality in navigation trajectories avoiding obstacles using only with a very small number of expert demonstration datasets (e.g., <= 6). Therefore, the proposed approach shows great potential to be applied to complex real-world applications in an expert data-efficient manner.

机译：逆钢筋学习（IRL）是一种用于自动奖励采集的技术，但是，难以应用于具有未知动力学的高维问题。本文提出了一种基于L（1）的稀疏高斯过程（GP）预测来解决IRL问题的有效方法，仅使用高度限制的专家演示。提出了一种GP模型，用于训练，以预测使用由不同奖励功能产生的深度增强学习产生的轨迹奖励对数据来预测奖励函数。训练有素的GP从收集的示范轨迹数据集中成功预测人类专家的奖励职能。为了证明我们的方法，所提出的方法适用于移动机器人的障碍避免航行。实验结果清楚地表明，机器人可以在导航轨迹中克隆专家的最优性，避免仅使用非常少量的专家演示数据集（例如<= 6）。因此，所提出的方法表现出以专家的数据有效的方式应用于复杂的现实世界应用的巨大潜力。

著录项

来源
《Mechatronics, IEEE/ASME Transactions on》 |2020年第4期|1739-1746|共8页
作者
Lim Jaehyun; Ha Seungchul; Choi Jongeun;
展开▼
作者单位

Yonsei Univ Sch Mech Engn Seoul 03722 South Korea;

Yonsei Univ Sch Mech Engn Seoul 03722 South Korea;

Yonsei Univ Sch Mech Engn Seoul 03722 South Korea;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Gaussian processes; inverse reinforcement learning; mobile robots;

机译：高斯流程;反增强学习;移动机器人;

相似文献

外文文献
中文文献
专利

1. Subjective and model-estimated reward prediction: association with the feedback-related negativity (FRN) and reward prediction error in a reinforcement learning task. [J] . Ichikawa N, Siegle GJ, Dombrovski A, International journal of psychophysiology: official journal of the International Organization of Psychophysiology . 2010,第3期

机译：主观和模型估计的奖励预测：与强化学习任务中与反馈相关的负性（FRN）和奖励预测错误相关联。
2. Deep reinforcement learning-based safe interaction for industrial human-robot collaboration using intrinsic reward function [J] . Quan Liu, Zhihao Liu, Bo Xiong, Advanced engineering informatics . 2021,第Auga期

机译：使用内在奖励功能的工业人员机器人合作的深度增强基于学习的安全互动
3. A Dynamic Adjusting Reward Function Method for Deep Reinforcement Learning with Adjustable Parameters [J] . Hu Zijian, Wan Kaifang, Gao Xiaoguang, Mathematical Problems in Engineering . 2019,第PTa23期

机译：具有可调节参数的深增强学习动态调整奖励功能方法
4. Enabling Rewards for Reinforcement Learning in Laser Beam Welding processes through Deep Learning [C] . Markus Schmitz, Florian Pinsker, Alexander Ruhri, IEEE International Conference on Machine Learning and Applications . 2020

机译：通过深度学习实现激光束焊接过程中加固学习的奖励
5. Deep Reinforcement Learning with Accelerated Reward Function Technique for Robotics Task Planning [D] . Shaikh, Shifa. 2021

机译：机器人任务规划加速奖励功能技术的深增强学习
6. Subjective and model-estimated reward prediction: Association with the feedback-related negativity (FRN) and reward prediction error in a reinforcement learning task [O] . Naho Ichikawa, Greg J. Siegle, Alexandre Y. Dombrovski, -1

机译：主观和模型估计奖励预测：与反馈相关的消极性（FRN）关联并在加固学习任务中奖励预测误差
7. Deep Reinforcement Learning With Optimized Reward Functions for Robotic Trajectory Planning [O] . Jiexin Xie, Zhenzhou Shao, Yue Li, 2019

机译：具有优化奖励功能的深度加强学习，用于机器人轨迹规划

Prediction of Reward Functions for Deep Reinforcement Learning via Gaussian Process Regression

摘要

著录项

相似文献

相关主题

期刊订阅