首页> 外文会议>Twentieth International Joint Conference on Artificial Intelligence(IJCAI-07) >Using Linear Programming for Bayesian Exploration in Markov Decision Processes
【24h】

Using Linear Programming for Bayesian Exploration in Markov Decision Processes

机译:在Markov决策过程中使用线性规划进行贝叶斯探索

获取原文

摘要

A key problem in reinforcement learning is finding a good balance between the need to explore the environment and the need to gain rewards by exploiting existing knowledge. Much research has been devoted to this topic, and many of the proposed methods are aimed simply at ensuring that enough samples are gathered to estimate well the value function. In contrast, [Bellman and Kal-aba, 19591 proposed constructing a representation in which the states of the original system are paired with knowledge about the current model. Hence, knowledge about the possible Markov models of the environment is represented and maintained explicitly. Unfortunately, this approach is intractable except for bandit problems (where it gives rise to Gittins indices, an optimal exploration method). In this paper, we explore ideas for making this method computationally tractable. We maintain a model of the environment as a Markov Decision Process. We sample finite-length trajectories from the infinite tree using ideas based on sparse sampling. Finding the values of the nodes of this sparse subtree can then be expressed as an optimization problem, which we solve using Linear Programming. We illustrate this approach on a few domains and compare it with other exploration algorithms.
机译:强化学习中的一个关键问题是,在探索环境的需求和通过利用现有知识获得奖励的需求之间找到良好的平衡。已经对该主题进行了大量研究,并且许多建议的方法仅旨在确保收集足够的样本以很好地估计值函数。相反,[Bellman和Kal-aba,19591年提出了构建一个表示形式,在该表示形式中,原始系统的状态与有关当前模型的知识配对。因此,明确表示并维护了有关可能的环境马尔可夫模型的知识。不幸的是,除了土匪问题(它会导致Gittins指数,这是一种最佳的勘探方法)之外,这种方法很难处理。在本文中,我们探索了使该方法在计算上易于处理的想法。我们将环境模型作为马尔可夫决策过程进行维护。我们使用基于稀疏采样的思想,从无限树中采样有限长度的轨迹。找到这个稀疏子树的节点的值然后可以表示为一个优化问题,我们可以使用线性规划来解决。我们在几个领域说明了这种方法,并将其与其他探索算法进行了比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号