【24h】

Covering Number as a Complexity Measure for POMDP Planning and Learning

机译:覆盖数作为POMDP规划和学习的复杂性度量

获取原文

摘要

Finding a meaningful way of characterizing the difficulty of partially observable Markov decision processes (POMDPs) is a core theoretical problem in POMDP research. State-space size is often used as a proxy for POMDP difficulty, but it is a weak metric at best. Existing work has shown that the covering number for the reachable belief space, which is a set of belief points that are reachable from the initial belief point, has interesting links with the complexity of POMDP planning, theoretically. In this paper, we present empirical evidence that the covering number for the reachable belief space (or just "covering number", for brevity) is a far better complexity measure than the state-space size for both planning and learning POMDPs on several small-scale benchmark problems. We connect the covering number to the complexity of learning POMDPs by proposing a provably convergent learning algorithm for POMDPs without reset given knowledge of the covering number.
机译:寻找一种表征部分可观察的马尔可夫决策过程(POMDP)难度的有意义方法是POMDP研究的核心理论问题。状态空间大小通常用作POMDP难度的代理,但充其量只是一个弱指标。现有工作表明,从理论上讲,可到达的信念空间的覆盖数是从初始信念点可以到达的一组信念点,与POMDP规划的复杂性有着有趣的联系。在本文中,我们提供了经验证据,对于在几个小规模的POMDP上进行计划和学习的POMDP,可到达的信念空间的覆盖数(或简称为“覆盖数”,为简洁起见)比状态空间的大小要好得多。衡量基准问题。通过为POMDP提出一种可证明收敛的学习算法,而无需重置已知覆盖数的知识,我们将覆盖数与学习POMDP的复杂性联系起来。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号