Partial-Information State-Based Optimization of Partially Observable Markov Decision Processes and the Separation Principle

首页> 外文期刊>IEEE Transactions on Automatic Control >Partial-Information State-Based Optimization of Partially Observable Markov Decision Processes and the Separation Principle

【24h】

Partial-Information State-Based Optimization of Partially Observable Markov Decision Processes and the Separation Principle

机译：基于局部信息状态的局部可观马尔可夫决策过程优化与分离原理

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose a partial-information state based approach to the optimization of the long-run average performance in a partially observable Markov decision process (POMDP). In this approach, the information history is summarized (at least partially) by a (or a few) statistic(s), not necessary sufficient, called a partial-information state, and actions depend on the partial-information state, rather than system states. We first propose the “single-policy based comparison principle,” under which we derive an HJB-type of optimality equation and policy iteration for the optimal policy in the partial-information-state based policy space. We then introduce the Q-sufficient statistics and show that if the partial-information state is Q-sufficient, then the optimal policy in the partial-information state based policy space is optimal in the space of all feasible information state based policies. We show that with some further conditions the well-known separation principle holds. The results are obtained by applying the direct comparison based approach initially developed for discrete event dynamic systems.

机译：我们提出了一种基于部分信息状态的方法，以在部分可观察的马尔可夫决策过程（POMDP）中优化长期平均性能。在这种方法中，信息历史记录是通过（或部分）统计信息（至少部分）（不一定足够）概括的，称为部分信息状态，并且操作取决于部分信息状态，而不是系统状态。我们首先提出“基于单策略的比较原理”，在此基础上，我们得出了基于局部信息状态的策略空间中最优策略的HJB型最优方程和策略迭代。然后，我们介绍了Q充足的统计量，并表明，如果部分信息状态为Q足够，则在所有可行的基于信息状态的策略空间中，基于部分信息状态的策略空间中的最优策略都是最优的。我们证明，在其他条件下，众所周知的分离原理成立。通过应用最初为离散事件动态系统开发的基于直接比较的方法来获得结果。

著录项

来源
《IEEE Transactions on Automatic Control》 |2014年第4期|921-936|共16页
作者

展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Direct comparison-based approach; HJB equation; Q-factor; Q-sufficient statistics; finite state controller; performance potential; policy iteration;

机译：基于直接比较的方法;HJB方程;Q因子;Q足够的统计量;有限状态控制器;性能潜力;策略迭代;

相似文献

外文文献
中文文献
专利

1. Optimizing Spatial and Temporal Reuse inWireless Networks by Decentralized Partially Observable Markov Decision Processes [J] . IEEE transactions on mobile computing . 2014,第4期

机译：通过分散的部分可观察的马尔可夫决策过程优化无线网络的时空复用
2. The Optimal Observability of Partially Observable Markov Decision Processes: Discrete State Space [J] . Rezaeian M.Vo B.-N.Evans J. S. Automatic Control, IEEE Transactions on . 2010,第12期

机译：部分可观马尔可夫决策过程的最优可观性：离散状态空间
3. Monotonicity properties for two-action partially observable Markov decision processes on partially ordered spaces [J] . European Journal of Operational Research . 2020,第3期

机译：两个动作部分可观察到的Markov决策过程的单调性属性在部分有序空间上
4. A Special Case of Partially Observable Markov Decision Processes Problem by Event-Based Optimization [C] . Junyu Zhang International Conference on Industrial Technology . 2016

机译：基于事件的优化的部分可观察到的马尔可夫决策过程的特殊情况
5. Pond-hindsight: Applying hindsight optimization to partially-observable markov decision processes. [D] . Olsen, Alan. 2011

机译：Pond-hindsight：将事后观察优化应用于部分可观察到的马尔可夫决策过程。
6. Decision Making Under Uncertainty: A Neural Model Based on Partially Observable Markov Decision Processes [O] . Rajesh P. N. Rao 2010

机译：不确定性下的决策：基于部分可观察的马尔可夫决策过程的神经模型
7. Stochastic Optimization of Controlled Partially Observable Markov Decision Processes [O] . Peter L. Bartlett, Jonathan Baxter 100

机译：受控部分可观察的马尔可夫决策过程的随机优化

Partial-Information State-Based Optimization of Partially Observable Markov Decision Processes and the Separation Principle

摘要

著录项

相似文献

相关主题

期刊订阅