Theoretical and Empirical Analysis of Reward Shaping in Reinforcement Learning

机译：强化学习中奖赏塑形的理论与实证分析

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Reinforcement learning suffers scalability problems due to the state space explosion and the temporal credit assignment problem. Knowledge-based approaches have received a significant attention in the area. Reward shaping is a particular approach to incorporate domain knowledge into reinforcement learning. Theoretical and empirical analysis of this paper reveals important properties of this principle, especially the influence of the reward type, MDP discount factor, and the way of evaluating the potential function on the performance.

机译：增强学习由于状态空间爆炸和时间信用分配问题而遭受可伸缩性问题。基于知识的方法在该领域受到了极大的关注。奖励塑造是一种将领域知识纳入强化学习的特殊方法。本文的理论和经验分析揭示了该原理的重要性质，特别是奖励类型，MDP折现因子的影响以及潜在功能对绩效的评估方式。

著录项

来源
《Machine Learning and Applications, 2009. ICMLA '09》|2009年|337-344|共8页
会议地点 Miami Beach FL(US);Miami Beach FL(US)
作者
Grzes Marek; Kudenko Daniel;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
heuristics; reinforcement learning; reward shaping;

机译：启发式;强化学习;奖励塑造;

相似文献

外文文献
中文文献
专利

1. A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis [J] . ABHIJIT GOSAVI Machine Learning . 2004,第1期

机译：一种基于策略迭代的平均奖励强化学习算法：收益管理与收敛性分析的实证结果
2. Online learning of shaping rewards in reinforcement learning. [J] . Grzes M, Kudenko D Neural Networks: The Official Journal of the International Neural Network Society . 2010,第4期

机译：在线学习塑造强化学习中的奖励。
3. Principled reward shaping for reinforcement learning via lyapunov stability theory [J] . Dong Yunlong, Tang Xiuchuan, Yuan Ye Neurocomputing . 2020,第Juna14期

机译：利用Lyapunov稳定性理论对强化学习的原则奖励塑造
4. Theoretical and Empirical Analysis of Reward Shaping in Reinforcement Learning [C] . Marek Grzes, Daniel Kudenko International Conference on Machine Learning and Applications . 2009

机译：钢筋学习中奖励塑造的理论与实证分析
5. Reward Prediction Errors Shape Memory during Reinforcement Learning [D] . Rouhani, Nina. 2020

机译：奖励预测错误在加固学习期间形状内存
6. Reinforcement Q-Learning Control With Reward Shaping Function for Swing Phase Control in a Semi-active Prosthetic Knee [O] . Yonatan Hutabarat, Kittipong Ekkachai, Mitsuhiro Hayashibe, 2020

机译：增强Q学习控制在半主动假肢膝关节中为摆动相位控制的奖励塑造功能
7. A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis [O] . Abhijit Gosavi 2004

机译：基于普通奖励政策迭代的加强学习算法：屈服管理和收敛分析的经验结果
8. Framing Reinforcement Learning from Human Reward: Reward Positivity, Temporal Discounting, Episodicity, and Performance. [R] . Knox, W. B., Stone, P. 2014

机译：从人类奖励中学习强化学习：奖励积极性，时间贴现，情节性和表现。

Theoretical and Empirical Analysis of Reward Shaping in Reinforcement Learning

摘要

著录项

相似文献

相关主题

期刊订阅