首页> 外文会议>International Conference on Automated Planning and Scheduling >Thompson Sampling Based Monte-Carlo Planning in POMDPs
【24h】

Thompson Sampling Based Monte-Carlo Planning in POMDPs

机译:基于汤普森的蒙特卡罗规划在POMDPS中

获取原文

摘要

Monte-Carlo tree search (MCTS) has been drawing great interest in recent years for planning under uncertainty. One of the key challenges is the trade-off between exploration and exploitation. To address this, we introduce a novel online planning algorithm for large POMDPs using Thompson sampling based MCTS that balances between cumulative and simple regrets. The proposed algorithm - Dirichlet-Dirichlet-NormalGamma based Partially Observable Monte-Carlo Planning (D~2NG-POMCP) - treats the accumulated reward of performing an action from a belief state in the MCTS search tree as a random variable following an unknown distribution with hidden parameters. Bayesian method is used to model and infer the posterior distribution of these parameters by choosing the conjugate prior in the form of a combination of two Dirichlet and one NormalGamma distributions. Thompson sampling is exploited to guide the action selection in the search tree. Experimental results confirmed that our algorithm outperforms the state-of-the-art approaches on several common benchmark problems.
机译:Monte-Carlo树搜索(MCT)近年来在不确定性下规划的近年来一直吸引着极大的兴趣。关键挑战之一是勘探和剥削之间的权衡。为了解决这个问题,我们使用基于汤普森采样的MCT在累积和简单的遗憾之间介绍了一部小型普通POMDP的新型在线规划算法。所提出的算法 - 基于Dirichlet-Dirichlet-Incormgama的部分可观察到的Monte-Carlo规划(D〜2ng-POMCP) - 处理在未知分发后,将MCTS搜索树中的信仰状态执行动作的累积奖励作为随机变量隐藏参数。贝叶斯方法用于通过以两种Dirichlet和一个普通术分布的组合的形式选择缀合物来模拟和推断这些参数的后部分布。挖掘汤普森采样被剥削以指导搜索树中的动作选择。实验结果证实,我们的算法优于最先进的普通基准问题的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号