Exploiting Action-Value Uncertainty to Drive Exploration in Reinforcement Learning

机译：利用行动价值不确定性推动强化学习探索

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Most of the research in Reinforcement Learning (RL) focuses on balancing exploration and exploitation. Indeed, the reasons for the success or failure of an RL algorithm often deal with the choice between the execution of exploratory actions and the exploitation of actions that are known to be good. In the context of Multi-Armed Bandits (MABs), many algorithms have addressed this dilemma. In particular, Thompson Sampling (TS) is a solution that, besides having good theoretical properties, usually works very well in practice. Unfortunately, the success of TS in MAB problems has not been replicated in RL, where it has shown to scale very poorly w.r.t. the dimensionality of the problem. Nevertheless, the application of TS in RL, instead of more myopic strategies such as ε-greedy, remains a promising solution. This paper addresses such issue proposing several algorithms to use TS in RL and deep RL in a feasible way. We present these algorithms explaining the intuitions and theoretical considerations behind them and discussing their advantages and drawbacks. Furthermore, we provide an empirical evaluation on an increasingly complex set of RL problems, showing the benefit of TS w.r.t. other sampling strategies available in classical and more recent RL literature.

机译：强化学习（RL）的大多数研究都集中在平衡探索与开发之间。确实，RL算法成功或失败的原因通常涉及在探索性操作的执行与已知良好操作的选择之间进行选择。在多武装匪徒（MAB）的背景下，许多算法都解决了这一难题。特别是，汤普森抽样（TS）是一种解决方案，除了具有良好的理论特性外，通常在实践中也能很好地发挥作用。不幸的是，在MAB问题中TS的成功尚未在RL中复制，在RL中它的扩展性非常差。问题的维度。尽管如此，TS在RL中的应用，而不是像ε-greedy这样的近视策略，仍然是一个有前途的解决方案。本文针对此类问题提出了几种算法，以可行的方式在RL和深度RL中使用TS。我们提出了这些算法，解释了它们背后的直觉和理论考虑，并讨论了它们的优缺点。此外，我们对越来越复杂的RL问题进行了实证评估，显示了TS w.r.t.的好处。古典和最近的RL文献中可用的其他采样策略。

著录项

来源
《International Joint Conference on Neural Networks》|2019年|1-8|共8页
会议地点
作者
Carlo D’Eramo; Andrea Cini; Marcello Restelli;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Learning Exploration/Exploitation Strategies for Single Trajectory Reinforcement Learning [J] . Damien Ernst, Francis Maes, Michael Castronovo, JMLR: Workshop and Conference Proceedings . 2012,第2012期

机译：单轨强化学习的学习探索/开发策略
2. Analysing factorizations of action-value networks for cooperative multi-agent reinforcement learning [J] . Castellini Jacopo, Oliehoek Frans A., Savani Rahul, Autonomous agents and multi-agent systems . 2021,第2期

机译：分析合作多智能体增强学习的动作价值网络的因素
3. Context Transfer in Reinforcement Learning Using Action-Value Functions [J] . Amin Mousavi, Babak Nadjar Araabi, Majid Nili Ahmadabadi Computational intelligence and neuroscience . 2014,第Null期

机译：使用动作值功能的强化学习中的上下文传递
4. Exploiting Action-Value Uncertainty to Drive Exploration in Reinforcement Learning [C] . Carlo D’Eramo, Andrea Cini, Marcello Restelli International Joint Conference on Neural Networks . 2019

机译：利用行动 - 价值不确定性在加固学习中推动探索
5. Exploitation and exploration in the hard disk drive industry [D] . Piao, Ming. 2007

机译：硬盘行业的开发与探索
6. Deficits in Positive Reinforcement Learning and Uncertainty-Driven Exploration are Associated with Distinct Aspects of Negative Symptoms in Schizophrenia [O] . Gregory P. Strauss, Michael J. Frank, James A. Waltz, -1

机译：积极加强学习和不确定性驱动的探索的缺陷与精神分裂症中的阴性症状的独特方面有关
7. Meta-learning of exploration-exploitation strategies in reinforcement learning [O] . Ernst, Damien 2013

机译：强化学习中探索与开发策略的元学习

Exploiting Action-Value Uncertainty to Drive Exploration in Reinforcement Learning

摘要

著录项

相似文献

相关主题

期刊订阅