...
首页> 外文期刊>Mechatronics, IEEE/ASME Transactions on >Prediction of Reward Functions for Deep Reinforcement Learning via Gaussian Process Regression
【24h】

Prediction of Reward Functions for Deep Reinforcement Learning via Gaussian Process Regression

机译:高斯过程回归深增强学习的奖励功能预测

获取原文
获取原文并翻译 | 示例
           

摘要

Inverse reinforcement learning (IRL) is a technique for automatic reward acquisition, however, it is difficult to apply to high-dimensional problems with unknown dynamics. This article proposes an efficient way to solve the IRL problem based on the sparse Gaussian process (GP) prediction with l(1)-regularization only using a highly limited number of expert demonstrations. A GP model is proposed to be trained to predict a reward function using trajectory-reward pair data generated by deep reinforcement learning with different reward functions. The trained GP successfully predicts the reward functions of human experts from their collected demonstration trajectory datasets. To demonstrate our approach, the proposed approach is applied to the obstacle avoidance navigation of the mobile robot. The experimental results clearly show that the robots can clone the experts' optimality in navigation trajectories avoiding obstacles using only with a very small number of expert demonstration datasets (e.g., <= 6). Therefore, the proposed approach shows great potential to be applied to complex real-world applications in an expert data-efficient manner.
机译:逆钢筋学习(IRL)是一种用于自动奖励采集的技术,但是,难以应用于具有未知动力学的高维问题。本文提出了一种基于L(1)的稀疏高斯过程(GP)预测来解决IRL问题的有效方法,仅使用高度限制的专家演示。提出了一种GP模型,用于训练,以预测使用由不同奖励功能产生的深度增强学习产生的轨迹奖励对数据来预测奖励函数。训练有素的GP从收集的示范轨迹数据集中成功预测人类专家的奖励职能。为了证明我们的方法,所提出的方法适用于移动机器人的障碍避免航行。实验结果清楚地表明,机器人可以在导航轨迹中克隆专家的最优性,避免仅使用非常少量的专家演示数据集(例如<= 6)。因此,所提出的方法表现出以专家的数据有效的方式应用于复杂的现实世界应用的巨大潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号