首页> 外文会议> >Proposal and evaluation of the penalty avoiding rational policy making algorithm with penalty level
【24h】

Proposal and evaluation of the penalty avoiding rational policy making algorithm with penalty level

机译:具有惩罚水平的避免惩罚理性决策算法的建议与评估

获取原文

摘要

Reinforcement learning (RL) is a kind of machine learning. It aims to adapt an agent to a given environment by utilizing a reward and a penalty. We know the Penalty Avoiding Rational Policy Making algorithm (PARP) [5] and the Penalty Avoiding Profit Sharing (PAPS) [6] as examples of RL systems that are able to suppress a penalty and learn a rational policy. However they cannot treat multiple penalties. In this paper, we extend PARP/PAPS to the environments where there are some kinds of penalties. We propose the Penalty Avoiding Rational Policy Making Algorithm with Penalty Level (PARPL) that can control how to avoid penalties. We show the effectiveness of PARPL by soccer game simulations.
机译:强化学习(RL)是一种机器学习。它旨在通过利用奖励和惩罚使代理人适应给定的环境。我们知道避免惩罚的理性决策算法(PARP)[5]和避免利益共享的惩罚[PAPS] [6]作为能够抑制惩罚并学习理性策略的RL系统的示例。但是,他们不能处理多种处罚。在本文中,我们将PARP / PAPS扩展到存在某些处罚的环境中。我们提出了一种带有惩罚等级的避免惩罚理性决策算法(PARPL),该算法可以控制如何避免惩罚。我们通过足球比赛模拟来证明PARPL的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号