首页> 外文学位 >Multistage decisions and risk in Markov decision processes: Towards effective approximate dynamic programming architectures.
【24h】

Multistage decisions and risk in Markov decision processes: Towards effective approximate dynamic programming architectures.

机译:马尔可夫决策过程中的多阶段决策和风险:建立有效的近似动态编程体系结构。

获取原文
获取原文并翻译 | 示例

摘要

The scientific domain of this thesis is optimization under uncertainty for discrete event stochastic systems. In particular, this thesis focuses on the practical implementation of the Dynamic Programming (DP) methodology to discrete event stochastic systems. Unfortunately DP in its crude form suffers from three severe computational obstacles that make its implementation to such systems an impossible task. This thesis addresses these obstacles by developing and executing practical Approximate Dynamic Programming (ADP) techniques.;Specifically, for the purposes of this thesis we developed the following ADP techniques. The first one is inspired from the Reinforcement Learning (RL) literature and is termed as Real Time Approximate Dynamic Programming (RTADP). The RTADP algorithm is meant for active learning while operating the stochastic system. The basic idea is that the agent while constantly interacts with the uncertain environment accumulates experience, which enables him to react more optimal in future similar situations. While the second one is an off-line ADP procedure. Both approaches are developed for discrete event stochastic systems and their main focus is the controlled exploration of the state space circumventing in such a way one of the severe computational obstacles of DP that is related with the cardinality of the state space.;These ADP techniques are demonstrated on a variety of discrete event stochastic systems such as: (i) a three stage queuing manufacturing network with recycle, (ii) a supply chain of the light aromatics of a typical refinery and (iii) several stochastic shortest path instances with a single starting and terminal state.;Moreover, this work addresses, in a systematic way, the issue of multistage risk within the DP framework by exploring the usage of single-period and multi-period risk sensitive utility functions. In this thesis we propose a special structure for a single-period utility and compare the derived policies in several multistage instances. Finally, we briefly attempt to intergrade the developed ADP procedures with the proposed utility to yield ADP risk sensitive policies.
机译:本文的科学领域是离散事件随机系统在不确定性下的优化。特别是,本文着重于动态规划(DP)方法在离散事件随机系统中的实际实现。不幸的是,原始形式的DP遭受三个严重的计算障碍,这使得将其实现到此类系统成为不可能的任务。本文通过开发和执行实用的近似动态编程(ADP)技术来解决这些障碍。具体而言,针对本文的目的,我们开发了以下ADP技术。第一个灵感来自强化学习(RL)文献,被称为实时近似动态编程(RTADP)。 RTADP算法用于在操作随机系统时进行主动学习。基本思想是,代理在不断与不确定的环境交互时会积累经验,这使他能够在未来的类似情况下做出更好的反应。第二个是离线ADP程序。两种方法都是针对离散事件随机系统开发的,其主要重点是对状态空间的受控探索,其方式是DP的严重计算障碍之一,它与状态空间的基数有关。在各种离散事件随机系统上进行了证明,例如:(i)具有回收利用的三阶段排队制造网络,(ii)典型炼油厂的轻质芳烃的供应链,以及(iii)单个具有多个随机最短路径实例此外,该工作通过探索单期和多期风险敏感效用函数的使用,以系统的方式解决了DP框架内的多阶段风险问题。在本文中,我们为单周期实用程序提出了一种特殊的结构,并比较了多个多阶段实例中的派生策略。最后,我们简要地尝试将已开发的ADP程序与建议的实用程序进行转换,以产生ADP风险敏感策略。

著录项

  • 作者

    Pratikakis, Nikolaos E.;

  • 作者单位

    Georgia Institute of Technology.;

  • 授予单位 Georgia Institute of Technology.;
  • 学科 Engineering Chemical.;Engineering Industrial.;Operations Research.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 215 p.
  • 总页数 215
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号