首页> 外文会议>International Joint Conference on Neural Networks >Exploiting Action-Value Uncertainty to Drive Exploration in Reinforcement Learning
【24h】

Exploiting Action-Value Uncertainty to Drive Exploration in Reinforcement Learning

机译:利用行动价值不确定性推动强化学习探索

获取原文

摘要

Most of the research in Reinforcement Learning (RL) focuses on balancing exploration and exploitation. Indeed, the reasons for the success or failure of an RL algorithm often deal with the choice between the execution of exploratory actions and the exploitation of actions that are known to be good. In the context of Multi-Armed Bandits (MABs), many algorithms have addressed this dilemma. In particular, Thompson Sampling (TS) is a solution that, besides having good theoretical properties, usually works very well in practice. Unfortunately, the success of TS in MAB problems has not been replicated in RL, where it has shown to scale very poorly w.r.t. the dimensionality of the problem. Nevertheless, the application of TS in RL, instead of more myopic strategies such as ε-greedy, remains a promising solution. This paper addresses such issue proposing several algorithms to use TS in RL and deep RL in a feasible way. We present these algorithms explaining the intuitions and theoretical considerations behind them and discussing their advantages and drawbacks. Furthermore, we provide an empirical evaluation on an increasingly complex set of RL problems, showing the benefit of TS w.r.t. other sampling strategies available in classical and more recent RL literature.
机译:强化学习(RL)的大多数研究都集中在平衡探索与开发之间。确实,RL算法成功或失败的原因通常涉及在探索性操作的执行与已知良好操作的选择之间进行选择。在多武装匪徒(MAB)的背景下,许多算法都解决了这一难题。特别是,汤普森抽样(TS)是一种解决方案,除了具有良好的理论特性外,通常在实践中也能很好地发挥作用。不幸的是,在MAB问题中TS的成功尚未在RL中复制,在RL中它的扩展性非常差。问题的维度。尽管如此,TS在RL中的应用,而不是像ε-greedy这样的近视策略,仍然是一个有前途的解决方案。本文针对此类问题提出了几种算法,以可行的方式在RL和深度RL中使用TS。我们提出了这些算法,解释了它们背后的直觉和理论考虑,并讨论了它们的优缺点。此外,我们对越来越复杂的RL问题进行了实证评估,显示了TS w.r.t.的好处。古典和最近的RL文献中可用的其他采样策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号