Entropy Maximization for Markov Decision Processes Under Temporal Logic Constraints

Savas Yagiz; Ornik Melkior; Cubuktepe Murat; Karabag Mustafa O.; Topcu Ufuk

首页> 外文期刊>IEEE Transactions on Automatic Control >Entropy Maximization for Markov Decision Processes Under Temporal Logic Constraints

【24h】

Entropy Maximization for Markov Decision Processes Under Temporal Logic Constraints

机译：在时间逻辑约束下Markov决策过程的熵最大化

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We study the problem of synthesizing a policy that maximizes the entropy of a Markov decision process (MDP) subject to a temporal logic constraint. Such a policy minimizes the predictability of the paths it generates, or dually, maximizes the exploration of different paths in an MDP while ensuring the satisfaction of a temporal logic specification. We first show that the maximum entropy of an MDP can be finite, infinite, or unbounded. We provide necessary and sufficient conditions under which the maximum entropy of an MDP is finite, infinite, or unbounded. We then present an algorithm which is based on a convex optimization problem to synthesize a policy that maximizes the entropy of an MDP. We also show that maximizing the entropy of an MDP is equivalent to maximizing the entropy of the paths that reach a certain set of states in the MDP. Finally, we extend the algorithm to an MDP subject to a temporal logic specification. In numerical examples, we demonstrate the proposed method on different motion planning scenarios and illustrate the relation between the restrictions imposed on the paths by a specification, the maximum entropy, and the predictability of paths.

机译：我们研究了合成策略的问题，该策略最大化Markov决策过程（MDP）的熵，而受到时间逻辑约束。这种策略最小化了它生成的路径的可预测性，或者双重地，最大化MDP中不同路径的探索，同时确保对时间逻辑规范的满足感。我们首先表明MDP的最大熵可以是有限的，无限的或无界的。我们提供必要和充分的条件，其中MDP的最大熵是有限，无限的或无界的。然后，我们提出了一种基于凸优化问题的算法，以合成最大化MDP熵的策略。我们还表明，最大化MDP的熵相当于最大化在MDP中达到一组状态的路径的熵。最后，我们将算法扩展到经过时间逻辑规范的MDP。在数值示例中，我们展示了在不同运动计划场景上的所提出的方法，并说明了通过规范，最大熵和路径的可预测性对路径施加的限制之间的关系。

著录项

来源
《IEEE Transactions on Automatic Control》 |2020年第4期|1552-1567|共16页
作者
Savas Yagiz; Ornik Melkior; Cubuktepe Murat; Karabag Mustafa O.; Topcu Ufuk;
展开▼
作者单位

Univ Texas Austin Dept Aerosp Engn Austin TX 78705 USA;

Univ Illinois Dept Aerosp Engn Urbana IL 61801 USA|Univ Illinois Coordinated Sci Lab Urbana IL 61801 USA;

Univ Texas Austin Dept Aerosp Engn Austin TX 78705 USA;

Univ Texas Austin Dept Elect & Comp Engn Austin TX 78705 USA;

Univ Texas Austin Dept Aerosp Engn Austin TX 78705 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Entropy; Markov processes; Random variables; Convex functions; Planning; Task analysis; Markov processes; entropy; temporal logic; convexity;

机译：熵;马尔可夫进程;随机变量;凸函数;规划;任务分析;马尔可夫进程;熵;时间逻辑;凸起;

相似文献

外文文献
中文文献
专利

1. Optimal Control of Markov Decision Processes With Linear Temporal Logic Constraints [J] . Ding X., Smith S.L., Belta C., IEEE Transactions on Automatic Control . 2014,第5期

机译：具有线性时间逻辑约束的马尔可夫决策过程的最优控制
2. Temporal logic control of general Markov decision processes by approximate policy refinement [J] . Sofie Haesaert, Sadegh Soudjani, Alessandro Abate IFAC PapersOnLine . 2018,第16期

机译：通过近似策略改进对一般Markov决策过程进行时间逻辑控制
3. Entropy Maximization for Markov and Semi-Markov Processes [J] . Valerie Girardin Methodology and computing in applied probability . 2004,第1期

机译：马尔可夫和半马尔可夫过程的熵最大化
4. Entropy Maximization for Constrained Markov Decision Processes [C] . Yagiz Savas, Melkior Ornik, Murat Cubuktepe, Annual Allerton Conference on Communication, Control, and Computing . 2018

机译：约束马尔可夫决策过程的熵最大化
5. Modern Methods of Hidden Markov Models and Partially Observable Markov Decision Processes in Biostatistics [D] . Xu, Zekun. 2020

机译：隐藏马尔可夫模型的现代方法和止痛性的部分可观察马尔可夫决策过程
6. Learning to maximize reward rate: a model based on semi-Markov decision processes [O] . Arash Khodadadi, Pegah Fakhari, Jerome R. Busemeyer 2014

机译：学习最大化奖励率：基于半马尔可夫决策过程的模型
7. Entropy Maximization for Markov Decision Processes Under Temporal Logic Constraints [O] . Yagiz Savas, Melkior Ornik, Murat Cubuktepe, 2020

机译：在时间逻辑约束下Markov决策过程的熵最大化
8. Learning Based Approach to Control Synthesis of Markov Decision Processes for Linear Temporal Logic Specifications. [R] . Sadigh, D., Kim, E., Coogan, S., 2014

机译：基于学习的线性时序逻辑规范马尔可夫决策过程综合控制方法。

Entropy Maximization for Markov Decision Processes Under Temporal Logic Constraints

摘要

著录项

相似文献

相关主题

期刊订阅