【24h】

LRTDP Versus UCT for Online Probabilistic Planning

机译:LRTDP与UCT进行在线概率规划

获取原文

摘要

UCT, the premier method for solving games such as Go, is also becoming the dominant algorithm for probabilistic planning. Out of the five solvers at the International Probabilistic Planning Competition (IPPC) 2011, four were based on the UCT algorithm. However, while a UCT-based planner, Prost, won the contest, an LRTDP-based system, Glutton, came in a close second, outperforming other systems derived from UCT. These results raise a question: what are the strengths and weaknesses of LRTDP and UCT in practice? This paper starts answering this question by contrasting the two approaches in the context of finite-horizon MDPs. We demonstrate that in such scenarios, UCT's lack of a sound termination condition is a serious practical disadvantage. In order to handle an MDP with a large finite horizon under a time constraint, UCT forces an expert to guess a non-myopic lookahead value for which it should be able to converge on the encountered states. Mistakes in setting this parameter can greatly hurt UCT's performance. In contrast, LRTDP's convergence criterion allows for an iterative deepening strategy. Using this strategy, LRTDP automatically finds the largest lookahead value feasible under the given time constraint. As a result, LRTDP has better performance and stronger theoretical properties. We present an online version of Glutton, named Gourmand, that illustrates this analysis and outperforms Prost on the set of IPPC-2011 problems.
机译:UCT是解决诸如Go之类的游戏的主要方法,它也正在成为概率计划的主要算法。在2011年国际概率规划竞赛(IPPC)的五个求解器中,有四个基于UCT算法。但是,虽然基于UCT的计划员Prost赢得了比赛,但基于LRTDP的系统Glutton紧随其后,胜过其他源自UCT的系统。这些结果提出了一个问题:LRTDP和UCT在实践中的优缺点是什么?本文通过在有限水平MDP的背景下对比两种方法来开始回答这个问题。我们证明,在这种情况下,UCT缺乏健全的终止条件是严重的实际缺点。为了在时间限制下处理具有较大有限范围的MDP,UCT迫使专家猜测一个非近视前瞻值,该值应能够收敛到所遇到的状态。设置此参数的错误可能会严重损害UCT的性能。相反,LRTDP的收敛标准允许迭代加深策略。使用此策略,LRTDP会在给定的时间限制下自动找到最大的前瞻值。结果,LRTDP具有更好的性能和更强的理论性能。我们提供了名为Gourmand的Glutton在线版本,该版本说明了这种分析,并且在IPPC-2011问题集方面优于Prost。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号