首页> 外文会议> >Proposal and evaluation of the penalty avoiding rational policy making algorithm with penalty level

【24h】

Proposal and evaluation of the penalty avoiding rational policy making algorithm with penalty level

机译：具有惩罚水平的避免惩罚理性决策算法的建议与评估

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Reinforcement learning (RL) is a kind of machine learning. It aims to adapt an agent to a given environment by utilizing a reward and a penalty. We know the Penalty Avoiding Rational Policy Making algorithm (PARP) [5] and the Penalty Avoiding Profit Sharing (PAPS) [6] as examples of RL systems that are able to suppress a penalty and learn a rational policy. However they cannot treat multiple penalties. In this paper, we extend PARP/PAPS to the environments where there are some kinds of penalties. We propose the Penalty Avoiding Rational Policy Making Algorithm with Penalty Level (PARPL) that can control how to avoid penalties. We show the effectiveness of PARPL by soccer game simulations.

机译：强化学习（RL）是一种机器学习。它旨在通过利用奖励和惩罚使代理人适应给定的环境。我们知道避免惩罚的理性决策算法（PARP）[5]和避免利益共享的惩罚[PAPS] [6]作为能够抑制惩罚并学习理性策略的RL系统的示例。但是，他们不能处理多种处罚。在本文中，我们将PARP / PAPS扩展到存在某些处罚的环境中。我们提出了一种带有惩罚等级的避免惩罚理性决策算法（PARPL），该算法可以控制如何避免惩罚。我们通过足球比赛模拟来证明PARPL的有效性。

著录项

来源
《》||P.2766-2773|共8页
会议地点
作者
Kazuteru Miyazaki; Tomomizu Kojima; Hiroaki Kobayashi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类工业技术;
关键词
Penalty Avoiding Rational Policy Making algorithm; Profit Sharing; Reinforcement Learning; Reward and Penalty; soccer game;

机译：规避惩罚的理性决策算法;利润分享;强化学习;奖惩;足球游戏;

相似文献

外文文献
中文文献
专利

1. Proposal of the Continuous-Valued Penalty Avoiding Rational Policy Making Algorithm [J] . Kazuteru Miyazaki Journal of Advanced Computatioanl Intelligence and Intelligent Informatics . 2012,第2a90期

机译：连续值罚分避免合理决策算法的建议
2. A New Improved Penalty Avoiding Rational Policy Making Algorithm for Keepaway with Continuous State Spaces [J] . Takuji Watanabe, Kazuteru Miyazaki, Hiroaki Kobayashi Journal of Advanced Computatioanl Intelligence and Intelligent Informatics . 2009,第6a72期

机译：具有连续状态空间的保持时间的一种新的改进的惩罚避免理性决策算法
3. Self-Organizing Probability State Variable Parameter Search Algorithms for Systems that Must Avoid High-Penalty Operating Regions [J] . Mucciardi Anthony N. Systems, Man and Cybernetics, IEEE Transactions on . 1974,第4期

机译：必须避免高罚分操作区域的系统的自组织概率状态变量参数搜索算法
4. Proposal and Evaluation of the Penalty Avoiding Rational Policy Making Algorithm with Penalty Level [C] . Kazuteru Miyazaki, Tomomizu Kojima, Hiroaki Kobayashi SICE Annual Conference . 2007

机译：惩罚罚款罚款罚款罚款的提案和评估
5. Penalty application: A study of penalty proposals and abatements by the Internal Revenue Service. [D] . Adams, Brenda Boswell. 2016

机译：罚款申请：由国税局对罚款建议和减免进行的研究。
6. Glycemic penalty index for adequately assessing and comparing different blood glucose control algorithms [O] . Tom Van Herpe, Jos De Brabanter, Martine Beullens, 2008

机译：血糖惩罚指数用于充分评估和比较不同的血糖控制算法
7. Performance evaluation of MAP algorithms with different penalties, object geometries and noise levels [O] . Tsai, Yu-Jung, Bousse, Alexandre B, Ehrhardt, Matthias J, 2015

机译：具有不同惩罚，对象几何形状和噪声水平的MAP算法的性能评估

Proposal and evaluation of the penalty avoiding rational policy making algorithm with penalty level

摘要

著录项

相似文献

相关主题

期刊订阅