...
首页> 外文期刊>European Journal of Operational Research >Finding optimal memoryless policies of POMDPs under the expected average reward criterion
【24h】

Finding optimal memoryless policies of POMDPs under the expected average reward criterion

机译:在预期平均奖励标准下找到POMDP的最佳无记忆策略

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, partially observable Markov decision processes (POMDPs) with discrete state and action space under the average reward criterion are considered from a recent-developed sensitivity point of view. By analyzing the average-reward performance difference formula, we propose a policy iteration algorithm with step sizes to obtain an optimal or local optimal memoryless policy. This algorithm improves the policy along the same direction as the policy iteration does and suitable step sizes guarantee the convergence of the algorithm. Moreover, the algorithm can be used in Markov decision processes (MDPs) with correlated actions. Two numerical examples are provided to illustrate the applicability of the algorithm.
机译:在本文中,从最近开发的敏感性角度考虑了在平均奖励标准下具有离散状态和动作空间的部分可观察的马尔可夫决策过程(POMDP)。通过分析平均回报性能差异公式,提出了一种具有步长的策略迭代算法,以获得最优或局部最优的无记忆策略。该算法沿与策略迭代相同的方向改进策略,并且适当的步长可确保算法的收敛性。此外,该算法可用于具有相关动作的马尔可夫决策过程(MDP)。提供了两个数值示例来说明该算法的适用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号