Finding optimal memoryless policies of POMDPs under the expected average reward criterion

Li Y.; Yin B.; Xi H.

首页> 外文期刊>European Journal of Operational Research >Finding optimal memoryless policies of POMDPs under the expected average reward criterion

【24h】

Finding optimal memoryless policies of POMDPs under the expected average reward criterion

机译：在预期平均奖励标准下找到POMDP的最佳无记忆策略

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, partially observable Markov decision processes (POMDPs) with discrete state and action space under the average reward criterion are considered from a recent-developed sensitivity point of view. By analyzing the average-reward performance difference formula, we propose a policy iteration algorithm with step sizes to obtain an optimal or local optimal memoryless policy. This algorithm improves the policy along the same direction as the policy iteration does and suitable step sizes guarantee the convergence of the algorithm. Moreover, the algorithm can be used in Markov decision processes (MDPs) with correlated actions. Two numerical examples are provided to illustrate the applicability of the algorithm.

机译：在本文中，从最近开发的敏感性角度考虑了在平均奖励标准下具有离散状态和动作空间的部分可观察的马尔可夫决策过程（POMDP）。通过分析平均回报性能差异公式，提出了一种具有步长的策略迭代算法，以获得最优或局部最优的无记忆策略。该算法沿与策略迭代相同的方向改进策略，并且适当的步长可确保算法的收敛性。此外，该算法可用于具有相关动作的马尔可夫决策过程（MDP）。提供了两个数值示例来说明该算法的适用性。

著录项

来源
《European Journal of Operational Research》 |2011年第3期|共12页
作者
Li Y.; Yin B.; Xi H.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类运筹学;
关键词
Correlated actions; Memoryless policy; Performance difference; Policy iteration with step sizes; POMDPs;

机译：相关操作;无记忆策略;性能差异;带步长的策略迭代;POMDP;

相似文献

外文文献
中文文献
专利

1. Finding optimal memoryless policies of POMDPs under the expected average reward criterion [J] . Li Y., Yin B., Xi H. European Journal of Operational Research . 2011,第3期

机译：在预期平均奖励标准下找到POMDP的最佳无记忆策略
2. Centralized Optimization for Dec-POMDPs Under the Expected Average Reward Criterion [J] . Xiaofeng Jiang, Xiaodong Wang, Hongsheng Xi, Automatic Control, IEEE Transactions on . 2017,第11期

机译：预期平均奖励标准下的Dec-POMDP集中优化
3. SAMPLE-PATH OPTIMAL STATIONARY POLICIES IN STABLE MARKOV DECISION CHAINS WITH THE AVERAGE REWARD CRITERION [J] . Cavazos-Cadena Rolando, Montes-De-Oca Raul, Sladky Karel Journal of Applied Probability . 2015,第2期

机译：带有平均奖励标准的稳定马尔可夫决策链中的样本路径最优平稳策略
4. Denumerable controlled Markov chains with average reward criterion: sample path optimality [C] . Cavazos-Cadena, R., Fernandez-Gaucheraud, . 1994

机译：具有平均奖励标准的可数控制马尔可夫链：样本路径最优
5. Is the bankruptcy risk rewarded by higher expected returns? Evidence from Japan, 1980--2000. [D] . Xu, Ming. 2003

机译：更高的预期收益是否可以弥补破产风险？日本的证据，1980--2000。
6. Optimal estimated process parameters side sensitive group runs chart based on expected average run length [O] . Huay Woon You 2018

机译：基于预期平均运行时间的最佳估计工艺参数侧敏感组运行图
7. An expected average reward criterion [O] . Bierth K.-J. 1987

机译：预期平均奖励标准

Finding optimal memoryless policies of POMDPs under the expected average reward criterion

摘要

著录项

相似文献

相关主题

期刊订阅