Thompson Sampling Based Monte-Carlo Planning in POMDPs

机译：基于汤普森的蒙特卡罗规划在POMDPS中

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Monte-Carlo tree search (MCTS) has been drawing great interest in recent years for planning under uncertainty. One of the key challenges is the trade-off between exploration and exploitation. To address this, we introduce a novel online planning algorithm for large POMDPs using Thompson sampling based MCTS that balances between cumulative and simple regrets. The proposed algorithm - Dirichlet-Dirichlet-NormalGamma based Partially Observable Monte-Carlo Planning (D~2NG-POMCP) - treats the accumulated reward of performing an action from a belief state in the MCTS search tree as a random variable following an unknown distribution with hidden parameters. Bayesian method is used to model and infer the posterior distribution of these parameters by choosing the conjugate prior in the form of a combination of two Dirichlet and one NormalGamma distributions. Thompson sampling is exploited to guide the action selection in the search tree. Experimental results confirmed that our algorithm outperforms the state-of-the-art approaches on several common benchmark problems.

机译：Monte-Carlo树搜索（MCT）近年来在不确定性下规划的近年来一直吸引着极大的兴趣。关键挑战之一是勘探和剥削之间的权衡。为了解决这个问题，我们使用基于汤普森采样的MCT在累积和简单的遗憾之间介绍了一部小型普通POMDP的新型在线规划算法。所提出的算法 - 基于Dirichlet-Dirichlet-Incormgama的部分可观察到的Monte-Carlo规划（D〜2ng-POMCP） - 处理在未知分发后，将MCTS搜索树中的信仰状态执行动作的累积奖励作为随机变量隐藏参数。贝叶斯方法用于通过以两种Dirichlet和一个普通术分布的组合的形式选择缀合物来模拟和推断这些参数的后部分布。挖掘汤普森采样被剥削以指导搜索树中的动作选择。实验结果证实，我们的算法优于最先进的普通基准问题的方法。

著录项

来源
《International Conference on Automated Planning and Scheduling》|2014年||共9页
会议地点
作者
Aijun Bai; Zongzhang Zhang; Feng Wu; Xiaoping Chen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP2-53;
关键词

相似文献

外文文献
中文文献
专利

1. Monte-Carlo based uncertainty analysis: Sampling efficiency and sampling convergence [J] . Hans Janssen Reliability Engineering & System Safety . 2013,第JANa期

机译：基于蒙特卡洛的不确定性分析：采样效率和采样收敛
2. Strategy Synthesis for POMDPs in Robot Planning via Game-Based Abstractions [J] . Winterer Leonore, Junges Sebastian, Wimmer Ralf, IEEE Transactions on Automatic Control . 2021,第3期

机译：基于游戏的抽象的机器人规划中POMDPS的策略综合
3. Modeling of UAV path planning based on IMM under POMDP framework [J] . Yang Qiming, Zhang Jiandong, Shi Guoqing Systems Engineering and Electronics, Journal of . 2019,第3期

机译：在POMDP框架下基于IMM的无人机路径规划建模
4. Thompson Sampling Based Monte-Carlo Planning in POMDPs [C] . Aijun Bai, Zongzhang Zhang, Feng Wu, International Conference on Automated Planning and Scheduling . 2014

机译：基于汤普森的蒙特卡罗规划在POMDPS中
5. Performance of local planners with respect to sampling strategies in sampling-based motion planning. [D] . Rahman, S.M. Rayhan. 2011

机译：在基于采样的运动计划中，本地计划人员在采样策略方面的表现。
6. Rapidly-Exploring Adaptive Sampling Tree*: A Sample-Based Path-Planning Algorithm for Unmanned Marine Vehicles Information Gathering in Variable Ocean Environments [O] . Chengke Xiong, Hexiong Zhou, Di Lu, 2020

机译：快速探索的自适应采样树*：基于样本的路径规划算法用于在变化的海洋环境中进行无人驾驶船舶信息的收集
7. Thompson sampling based Monte-Carlo planning in POMDPs [O] . Bai Aijun, Wu Feng, Zhang Zongzhang, 100

机译：基于Thompson抽样的pOmDp中的蒙特卡罗计划

Thompson Sampling Based Monte-Carlo Planning in POMDPs

摘要

著录项

相似文献

相关主题

期刊订阅