Active Exploration in Markov Decision Processes

Jean Tarbouriech; Alessandro Lazaric

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Active Exploration in Markov Decision Processes

【24h】

Active Exploration in Markov Decision Processes

机译：马尔可夫决策过程中的积极探索

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We introduce the active exploration problem in Markov decision processes (MDPs). Each state of the MDP is characterized by a random value and the learner should gather samples to estimate the mean value of each state as accurately as possible. Similarly to active exploration in multi-armed bandit (MAB), states may have different levels of noise, so that the higher the noise, the more samples are needed. As the noise level is initially unknown, we need to trade off the exploration of the environment to estimate the noise and the exploitation of these estimates to compute a policy maximizing the accuracy of the mean predictions. We introduce a novel learning algorithm to solve this problem showing that active exploration in MDPs may be significantly more difficult than in MAB. We also derive a heuristic procedure to mitigate the negative effect of slowly mixing policies. Finally, we validate our findings on simple numerical simulations.

机译：我们在马尔可夫决策过程（MDP）中介绍了主动探索问题。 MDP的每个状态都具有随机值，学习者应收集样本以尽可能准确地估计每个状态的平均值。与在多武装匪徒（MAB）中进行主动探索类似，状态可能具有不同级别的噪声，因此，噪声越高，需要的样本就越多。由于噪声水平最初是未知的，因此我们需要权衡对环境的探索以估计噪声以及对这些估计的利用以权衡使平均预测的准确性最大化的策略。我们引入了一种新颖的学习算法来解决此问题，表明在MDP中进行主动探索可能比在MAB中更加困难。我们还推导了启发式程序，以减轻缓慢混合政策的负面影响。最后，我们在简单的数值模拟中验证我们的发现。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2018年第2009期|共9页
作者
Jean Tarbouriech; Alessandro Lazaric;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Active Exploration in Markov Decision Processes [J] . Jean Tarbouriech, Alessandro Lazaric JMLR: Workshop and Conference Proceedings . 2018,第2010期

机译：马尔可夫决策过程中的积极探索
2. Of Cores: A Partial-Exploration Framework for Markov Decision Processes [J] . Meggendorfer Tobias, K?etínsky Jan Logical Methods in Computer Science . 2020,第4期

机译：核心：马尔可夫决策过程的部分探索框架
3. Of Cores: A Partial-Exploration Framework for Markov Decision Processes [J] . Jan Kret, Tobias Meggendorfer LIPIcs : Leibniz International Proceedings in Informatics . 2019,第2015期

机译：的核心：马尔可夫决策过程的局部探索框架
4. Markov Decision Processes with Unknown State Feature Values for Safe Exploration using Gaussian Processes [C] . Matthew Budd, Bruno Lacerda, Paul Duckworth, IEEE/RSJ International Conference on Intelligent Robots and Systems . 2020

机译：Markov决策过程，具有未知状态特征值，用于使用高斯进程的安全探索
5. A New Reinforcement Learning Algorithm with Fixed Exploration for Semi-Markov Decision Processes [D] . Encapera, Angelo Michael. 2017

机译：半马尔可夫决策过程的固定探索新强化学习算法
6. Decision Making Under Uncertainty: A Neural Model Based on Partially Observable Markov Decision Processes [O] . Rajesh P. N. Rao 2010

机译：不确定性下的决策：基于部分可观察的马尔可夫决策过程的神经模型
7. Efficient and Safe Exploration in Deterministic Markov Decision Processes with Unknown Transition Models [O] . Erdem Biyik, Jonathan Margoliash, Shahrouz Ryan Alimo, 2019

机译：具有未知过渡模型的确定性马尔可夫决策过程的高效和安全探索

Active Exploration in Markov Decision Processes

摘要

著录项

相似文献

相关主题

期刊订阅