...
首页> 外文期刊>IEEE Transactions on Automatic Control >Controlled Markov Processes With Safety State Constraints
【24h】

Controlled Markov Processes With Safety State Constraints

机译:具有安全状态约束的受控马尔可夫过程

获取原文
获取原文并翻译 | 示例
           

摘要

This paper considers a Markov decision process (MDP) model with safety state constraints, which specify polytopic invariance constraints on the state probability distribution (pd) for all time epochs. Typically, in the MDP framework, safety is addressed indirectly by penalizing failure states through the reward function. However, such an approach does not allow imposing hard constraints on the state pd, which could be an issue for practical applications where the chance of failure must be limited to prescribed bounds. In this paper, we explicitly separate state constraints from the reward function. We provide analysis and synthesis methods to impose generalized safety constraints at all time epochs, unlike current constrained MDP approaches where such constraints can only be imposed on the stationary distributions. We show that, contrary to the unconstrained MDP policies, optimal safe MDP policies depend on the initial state pd. We present novel algorithms for both finite- and infinite-horizon MDPs to synthesize feasible decision-making policies that satisfy safety constraints for all time epochs and ensure that the performance is above a computable lower bound. Linear programming implementations of the proposed algorithms are developed, which are formulated by using the duality theory of convex optimization. A swarm control simulation example is also provided to demonstrate the use of proposed algorithms.
机译:本文考虑了具有安全状态约束的马尔可夫决策过程(MDP)模型,该模型为所有时间段的状态概率分布(pd)指定了多态不变性约束。通常,在MDP框架中,通过奖励功能对失败状态进行惩罚来间接解决安全问题。但是,这种方法不允许对状态pd施加硬约束,这对于实际应用可能是一个问题,在实际应用中,失败的机会必须限制在规定的范围内。在本文中,我们将状态约束与奖励函数明确分开。我们提供分析和综合方法,以便在所有时间段都施加一般性安全约束,这不同于当前的约束MDP方法,在这种方法中,此类约束只能施加在固定分布上。我们表明,与不受约束的MDP策略相反,最佳安全MDP策略取决于初始状态pd。我们提出了适用于有限水平和无限水平MDP的新颖算法,以合成满足所有时间段安全约束的可行决策策略,并确保性能高于可计算的下限。开发了所提出算法的线性规划实现,并使用凸优化对偶理论制定了线性规划实现。还提供了一个群体控制仿真示例,以演示所提出算法的使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号