【24h】

Continuous Time Markov Decision Processes with Expected Discounted Total Rewards

机译:具有预期折扣总奖励的连续时间Markov决策过程

获取原文
获取原文并翻译 | 示例

摘要

This paper discusses continuous time Markov decision processes with criterion of expected discounted total rewards, where the state space is countable, the reward rate function is extended real-valued and the discount rate is a real number. Under necessary conditions that the model is well defined, the state space is partitioned into three subsets, on which the optimal value function is positive infinity, negative infinity, or finite, respectively. Correspondingly, the model is reduced into three submodels, by generalizing policies and eliminating some worst actions. Then for the submodel with finite optimal value, the validity of the optimality equation is shown and some its properties are obtained.
机译:本文讨论了具有预期折扣总奖励准则的连续时间马尔可夫决策过程,其中状态空间是可数的,奖励率函数扩展为实值,折扣率是实数。在良好定义模型的必要条件下,将状态空间划分为三个子集,其上的最优值函数分别为正无穷大,负无穷大或有限。相应地,通过归纳策略并消除一些最坏的行为,该模型被简化为三个子模型。然后,对于具有有限最优值的子模型,证明了最优方程的有效性并获得了其某些性质。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号