...
首页> 外文期刊>IEEE Transactions on Automatic Control >Entropy Maximization for Markov Decision Processes Under Temporal Logic Constraints
【24h】

Entropy Maximization for Markov Decision Processes Under Temporal Logic Constraints

机译:在时间逻辑约束下Markov决策过程的熵最大化

获取原文
获取原文并翻译 | 示例
           

摘要

We study the problem of synthesizing a policy that maximizes the entropy of a Markov decision process (MDP) subject to a temporal logic constraint. Such a policy minimizes the predictability of the paths it generates, or dually, maximizes the exploration of different paths in an MDP while ensuring the satisfaction of a temporal logic specification. We first show that the maximum entropy of an MDP can be finite, infinite, or unbounded. We provide necessary and sufficient conditions under which the maximum entropy of an MDP is finite, infinite, or unbounded. We then present an algorithm which is based on a convex optimization problem to synthesize a policy that maximizes the entropy of an MDP. We also show that maximizing the entropy of an MDP is equivalent to maximizing the entropy of the paths that reach a certain set of states in the MDP. Finally, we extend the algorithm to an MDP subject to a temporal logic specification. In numerical examples, we demonstrate the proposed method on different motion planning scenarios and illustrate the relation between the restrictions imposed on the paths by a specification, the maximum entropy, and the predictability of paths.
机译:我们研究了合成策略的问题,该策略最大化Markov决策过程(MDP)的熵,而受到时间逻辑约束。这种策略最小化了它生成的路径的可预测性,或者双重地,最大化MDP中不同路径的探索,同时确保对时间逻辑规范的满足感。我们首先表明MDP的最大熵可以是有限的,无限的或无界的。我们提供必要和充分的条件,其中MDP的最大熵是有限,无限的或无界的。然后,我们提出了一种基于凸优化问题的算法,以合成最大化MDP熵的策略。我们还表明,最大化MDP的熵相当于最大化在MDP中达到一组状态的路径的熵。最后,我们将算法扩展到经过时间逻辑规范的MDP。在数值示例中,我们展示了在不同运动计划场景上的所提出的方法,并说明了通过规范,最大熵和路径的可预测性对路径施加的限制之间的关系。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号