【24h】

Symbolic Dynamic Programming for Continuous State and Action MDPs

机译:连续状态和动作MDP的符号动态编程

获取原文

摘要

Many real-world decision-theoretic planning problems are naturally modeled using both continuous state and action (CSA) spaces, yet little work has provided exact solutions for the case of continuous actions. In this work, we propose a symbolic dynamic programming (SDP) solution to obtain the optimal closed-form value function and policy for CSA-MDPs with mul-tivariate continuous state and actions, discrete noise, piecewise linear dynamics, and piecewise linear (or restricted piecewise quadratic) reward. Our key contribution over previous SDP work is to show how the continuous action maximization step in the dynamic programming backup can be evaluated optimally and symbolically - a task which amounts to symbolic constrained optimization subject to unknown state parameters; we further integrate this technique to work with an efficient and compact data structure for SDP - the extended algebraic decision diagram (XADD). We demonstrate empirical results on a didactic nonlinear planning example and two domains from operations research to show the. first automated exact solution to these problems.
机译:许多现实世界中的决策理论规划问题都是使用连续状态和动作(CSA)空间自然建模的,但是很少有工作为连续动作提供精确的解决方案。在这项工作中,我们提出了一种符号动态规划(SDP)解决方案,以获得具有多变量连续状态和动作,离散噪声,分段线性动力学和分段线性(或限制分段二次)奖励。我们对以前的SDP工作的主要贡献是,展示了如何以最佳方式和符号方式评估动态编程备份中的连续动作最大化步骤-这项任务相当于在未知状态参数的情况下进行符号约束优化;我们进一步集成了此技术,以与SDP的高效紧凑数据结构一起使用-扩展代数决策图(XADD)。我们通过一个有说服力的非线性规划实例和运筹学的两个领域展示了经验结果。首先自动解决这些问题。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号