首页> 外文会议>International IEEE Conference on Intelligent Systems >Plan-based reward shaping for reinforcement learning
【24h】

Plan-based reward shaping for reinforcement learning

机译:基于计划的加固学习奖励塑造

获取原文

摘要

Reinforcement learning, while being a highly popular learning technique for agents and multi-agent systems, has so far encountered difficulties when applying it to more complex domains due to scaling-up problems. This paper focuses on the use of domain knowledge to improve the convergence speed and optimality of various RL techniques. Specifically, we propose the use of high-level STRIPS operator knowledge in reward shaping to focus the search for the optimal policy. Empirical results show that the plan-based reward shaping approach outperforms other RL techniques, including alternative manual and MDP-based reward shaping when it is used in its basic form. We show that MDP-based reward shaping may fail and successful experiments with STRIPS-based shaping suggest modifications which can overcome encountered problems. The STRIPS-based method we propose allows expressing the same domain knowledge in a different way and the domain expert can choose whether to define an MDP or STRIPS planning task. We also evaluate the robustness of the proposed STRIPS-based technique to errors in the plan knowledge.
机译:钢筋学习,虽然是代理和多种子体系统的高度流行的学习技术,但到目前为止遇到困难,因为由于缩放问题,将其应用于更复杂的域时。本文侧重于使用域知识来提高各种R1技术的收敛速度和最优性。具体而言,我们建议使用高级条带操作员知识在奖励整形中,以重点搜索最佳政策。经验结果表明,基于计划的奖励整形方法优于其他RL技术,包括当其基本形式使用时的替代手动和基于MDP的奖励整形。我们表明基于MDP的奖励整形可能会失败和成功的实验,基于条带的整形表明修改可以克服遇到的问题。我们提出的基于条的方法允许以不同的方式表达相同的域知识,并且域专家可以选择是否定义MDP或条带规划任务。我们还评估所提出的基于条纹的技术对计划知识中的错误的鲁棒性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号