Continuous Time Markov Decision Processes with Expected Discounted Total Rewards

机译：具有预期折扣总奖励的连续时间Markov决策过程

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper discusses continuous time Markov decision processes with criterion of expected discounted total rewards, where the state space is countable, the reward rate function is extended real-valued and the discount rate is a real number. Under necessary conditions that the model is well defined, the state space is partitioned into three subsets, on which the optimal value function is positive infinity, negative infinity, or finite, respectively. Correspondingly, the model is reduced into three submodels, by generalizing policies and eliminating some worst actions. Then for the submodel with finite optimal value, the validity of the optimality equation is shown and some its properties are obtained.

机译：本文讨论了具有预期折扣总奖励准则的连续时间马尔可夫决策过程，其中状态空间是可数的，奖励率函数扩展为实值，折扣率是实数。在良好定义模型的必要条件下，将状态空间划分为三个子集，其上的最优值函数分别为正无穷大，负无穷大或有限。相应地，通过归纳策略并消除一些最坏的行为，该模型被简化为三个子模型。然后，对于具有有限最优值的子模型，证明了最优方程的有效性并获得了其某些性质。

著录项

来源
《International Conference on Computational Science - ICCA 2003 Pt.2 Jun 2-4, 2003 Melbourne, Australia and St. Petersburg, Russia》|2003年|p.64-73|共10页
会议地点 Melbourne(AU) St. Petersburg(RU);Melbourne(AU) St. Petersburg(RU)
作者
Qiying Hu; Jianyong Liu; Wuyi Yue;
展开▼
作者单位

College of International Business Management, Shanghai University, Shanghai 201800, China;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Optimally solving Markov decision processes with total expected discounted reward function: Linear programming revisited [J] . Oguzhan Alagoz, Mehmet U.S. Ayvaci, Jeffrey T. Linderoth Computers & Industrial Engineering . 2015,第sepa期

机译：使用总预期折现报酬函数优化求解马尔可夫决策过程：重新考虑线性规划
2. Continuous-Time Markov Decision Processes with Unbounded Transition and Discounted-Reward Rates [J] . Hao Yan, Junyu Zhang, Xianping Guo Stochastic Analysis and Applications . 2008,第2期

机译：具有无界转移和折扣率的连续时间马尔可夫决策过程
3. Continuous-Time Markov Decision Processes with Discounted Rewards: The Case of Polish Spaces [J] . Xianping Guo Mathematics of operations research . 2007,第1期

机译：具有折扣奖励的连续时间马尔可夫决策过程：波兰空间的案例
4. Continuous Time Markov Decision Processes with Expected Discounted Total Rewards [C] . Qiying Hu, Jianyong Liu, Wuyi Yue International Conference on Computational Science . 2003

机译：连续时间马尔可夫决策流程，预期折扣总奖励
5. Regret-based reward elicitation for Markov decision processes. [D] . Kevin, Regan. 2014

机译：基于后悔的马尔可夫决策过程的奖励启发。
6. Learning to maximize reward rate: a model based on semi-Markov decision processes [O] . Arash Khodadadi, Pegah Fakhari, Jerome R. Busemeyer 2014

机译：学习最大化奖励率：基于半马尔可夫决策过程的模型
7. Continuous Time Markov Decision Processes with Expected Discounted Total Rewards [O] . Qiying Hu, Jianyong Liu, Wuyi Yue 2003

机译：连续时间马尔可夫决策流程，预期折扣总奖励
8. Countable State Discounted Markovian Decision Processes with Unbounded Rewards [R] . Harrison, J. M. 1970

机译：具有无限奖励的可数州折现马尔可夫决策过程

Continuous Time Markov Decision Processes with Expected Discounted Total Rewards

摘要

著录项

相似文献

相关主题

期刊订阅