...
首页> 外文期刊>Acta Automatica Sinica >A Simulation Optimization Algorithm for CTMDPs Based on Randomized Stationary Policies
【24h】

A Simulation Optimization Algorithm for CTMDPs Based on Randomized Stationary Policies

机译:基于随机平稳策略的CTMDP仿真优化算法

获取原文
获取原文并翻译 | 示例
           

摘要

Based on the theory of Markov performance potentials and neuro-dynamic programming (NDP) methodology, we study simulation optimization algorithm for a class of continuous time Markov decision processes (CTMDPs) under randomized stationary policies. The proposed algorithm will estimate the gradient of average cost performance measure with respect to policy parameters by transforming a continuous time Markov process into a uniform Markov chain and simulating a single sample path of the chain. The goal is to look for a suboptimal randomized stationary policy. The algorithm derived here can meet the needs of performance optimization of many difficult systems with large-scale state space. Finally, a numerical example for a controlled Markov process is provided.
机译:基于马尔可夫性能潜能理论和神经动力学规划(NDP)方法,我们研究了随机平稳策略下一类连续时间马尔可夫决策过程(CTMDP)的仿真优化算法。所提出的算法将通过将连续时间马尔可夫过程转换为统一的马尔可夫链并模拟链的单个样本路径,来估计平均成本绩效度量相对于策略参数的梯度。目标是寻找次优的随机平稳策略。这里导出的算法可以满足许多具有大规模状态空间的困难系统的性能优化需求。最后,提供了受控马尔可夫过程的数值示例。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号