首页> 外文学位 >Action-based representation discovery in Markov decision processes .
【24h】

Action-based representation discovery in Markov decision processes .

机译:马尔可夫决策过程中基于动作的表示发现。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation investigates the problem of representation discovery in discrete Markov decision processes, namely how agents can simultaneously learn representation and optimal control. Previous work on function approximation techniques for MDPs largely employed hand-engineered basis functions. In this dissertation, we explore approaches to automatically construct these basis functions and demonstrate that automatically constructed basis functions significantly outperform more traditional, hand-engineered approaches.;We specifically examine two problems: how to automatically build representations for action-value functions by explicitly incorporating actions into a representation, and how representations can be automatically constructed by exploiting a pre-specified task hierarchy. We first introduce a technique for learning basis functions directly in state-action space. The approach constructs basis functions using spectral analysis of a state-action graph which captures the underlying structure of the state-action space of the MDP. We describe two approaches to constructing these graphs and evaluate the approach on MDPs with discrete state and action spaces.;We show how our approach can be used to approximate state-action value functions when the agent has access to macro-actions: actions that take more than one time step and have predefined policies. We describe how the state-action graphs can be modified to incorporate information about the macro-actions and experimentally evaluate this approach for SMDPs with discrete state and action spaces.;Finally, we describe how hierarchical reinforcement learning can be used to scale up automatic basis function construction. We extend automatic basis function construction techniques to multi-level task hierarchies and describe how basis function construction can exploit the value function decomposition given by a fixed task hierarchy. We demonstrate that combining task hierarchies with automatic basis function construction allows basis function techniques to scale to larger problems and leads to a significant speed-up in learning.
机译:本文研究了离散马尔可夫决策过程中表示发现的问题,即主体如何同时学习表示和最优控制。以前有关MDP的函数逼近技术的工作主要采用手工设计的基础函数。在本文中,我们探索了自动构造这些基础函数的方法,并证明了自动构造的基础函数明显优于传统的手工设计方法。我们专门研究了两个问题:如何通过明确地合并自动构建动作值函数的表示形式动作转化为表示形式,以及如何利用预定的任务层次结构自动构建表示形式。我们首先介绍一种直接在状态动作空间中学习基础函数的技术。该方法使用状态动作图的频谱分析来构造基本函数,该图捕获了MDP的状态动作空间的基础结构。我们描述了构造这些图的两种方法,并评估了具有离散状态和动作空间的MDP的方法。;我们展示了当代理访问宏动作时如何使用我们的方法来近似状态动作值函数:采取的动作超过一个时间步长并具有预定义的策略。我们描述了如何修改状态动作图以合并有关宏动作的信息,并通过实验评估了具有离散状态和动作空间的SMDP的这种方法。功能建设。我们将自动基函数构造技术扩展到多级任务层次结构,并描述基函数构造如何利用固定任务层次结构给出的价值函数分解。我们证明,将任务层次结构与自动基础函数构造相结合,可以使基础函数技术扩展到更大的问题,并显着提高学习速度。

著录项

  • 作者

    Osentoski, Sarah.;

  • 作者单位

    University of Massachusetts Amherst.;

  • 授予单位 University of Massachusetts Amherst.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 155 p.
  • 总页数 155
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:38:18

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号