【24h】

Planning in Factored Action Spaces with Symbolic Dynamic Programming

机译:使用符号动态规划在因式动作空间中进行规划

获取原文

摘要

We consider symbolic dynamic programming (SDP) for solving Markov Decision Processes (MDP) with factored state and action spaces, where both states and actions are described by sets of discrete variables. Prior work on SDP has considered only the case of factored states and ignored structure in the action space, causing them to scale poorly in terms of the number of action variables. Our main contribution is to present the first SDP-based planning algorithm for leveraging both state and action space structure in order to compute compactly represented value functions and policies. Since our new algorithm can potentially require more space than when action structure is ignored, our second contribution is to describe an approach for smoothly trading-off space versus time via recursive conditioning. Finally, our third contribution is to introduce a novel SDP approximation that often significantly reduces planning time with little loss in quality by exploiting action structure in weakly coupled MDPs. We present empirical results in three domains with factored action spaces that show that our algorithms scale much better with the number of action variables as compared to state-of-the-art SDP algorithms.
机译:我们考虑符号动态规划(SDP)来解决带因果关系状态和动作空间的马尔可夫决策过程(MDP),其中状态和动作都由离散变量集来描述。关于SDP的先前工作仅考虑了分解状态的情况,而忽略了操作空间中的结构,从而导致它们在操作变量的数量方面的伸缩性很差。我们的主要贡献是提出第一个基于SDP的计划算法,以利用状态和动作空间结构来计算紧凑表示的价值函数和策略。由于我们的新算法可能比忽略动作结构时需要更多的空间,因此我们的第二个贡献是描述了一种通过递归条件平稳地权衡时间与空间的方法。最后,我们的第三项贡献是引入一种新颖的SDP近似值,该近似值通常通过利用弱耦合MDP中的动作结构来显着减少计划时间,而质量损失很少。我们在三个带因果作用空间的领域中给出了经验结果,这些结果表明,与最新的SDP算法相比,我们的算法在具有作用变量的情况下伸缩性更好。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号