...
首页> 外文期刊>IEEE Transactions on Automatic Control >Partial-Information State-Based Optimization of Partially Observable Markov Decision Processes and the Separation Principle
【24h】

Partial-Information State-Based Optimization of Partially Observable Markov Decision Processes and the Separation Principle

机译:基于局部信息状态的局部可观马尔可夫决策过程优化与分离原理

获取原文
获取原文并翻译 | 示例
           

摘要

We propose a partial-information state based approach to the optimization of the long-run average performance in a partially observable Markov decision process (POMDP). In this approach, the information history is summarized (at least partially) by a (or a few) statistic(s), not necessary sufficient, called a partial-information state, and actions depend on the partial-information state, rather than system states. We first propose the “single-policy based comparison principle,” under which we derive an HJB-type of optimality equation and policy iteration for the optimal policy in the partial-information-state based policy space. We then introduce the Q-sufficient statistics and show that if the partial-information state is Q-sufficient, then the optimal policy in the partial-information state based policy space is optimal in the space of all feasible information state based policies. We show that with some further conditions the well-known separation principle holds. The results are obtained by applying the direct comparison based approach initially developed for discrete event dynamic systems.
机译:我们提出了一种基于部分信息状态的方法,以在部分可观察的马尔可夫决策过程(POMDP)中优化长期平均性能。在这种方法中,信息历史记录是通过(或部分)统计信息(至少部分)(不一定足够)概括的,称为部分信息状态,并且操作取决于部分信息状态,而不是系统状态。我们首先提出“基于单策略的比较原理”,在此基础上,我们得出了基于局部信息状态的策略空间中最优策略的HJB型最优方程和策略迭代。然后,我们介绍了Q充足的统计量,并表明,如果部分信息状态为Q足够,则在所有可行的基于信息状态的策略空间中,基于部分信息状态的策略空间中的最优策略都是最优的。我们证明,在其他条件下,众所周知的分离原理成立。通过应用最初为离散事件动态系统开发的基于直接比较的方法来获得结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号