Action-based representation discovery in Markov decision processes .

机译：马尔可夫决策过程中基于动作的表示发现。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This dissertation investigates the problem of representation discovery in discrete Markov decision processes, namely how agents can simultaneously learn representation and optimal control. Previous work on function approximation techniques for MDPs largely employed hand-engineered basis functions. In this dissertation, we explore approaches to automatically construct these basis functions and demonstrate that automatically constructed basis functions significantly outperform more traditional, hand-engineered approaches.;We specifically examine two problems: how to automatically build representations for action-value functions by explicitly incorporating actions into a representation, and how representations can be automatically constructed by exploiting a pre-specified task hierarchy. We first introduce a technique for learning basis functions directly in state-action space. The approach constructs basis functions using spectral analysis of a state-action graph which captures the underlying structure of the state-action space of the MDP. We describe two approaches to constructing these graphs and evaluate the approach on MDPs with discrete state and action spaces.;We show how our approach can be used to approximate state-action value functions when the agent has access to macro-actions: actions that take more than one time step and have predefined policies. We describe how the state-action graphs can be modified to incorporate information about the macro-actions and experimentally evaluate this approach for SMDPs with discrete state and action spaces.;Finally, we describe how hierarchical reinforcement learning can be used to scale up automatic basis function construction. We extend automatic basis function construction techniques to multi-level task hierarchies and describe how basis function construction can exploit the value function decomposition given by a fixed task hierarchy. We demonstrate that combining task hierarchies with automatic basis function construction allows basis function techniques to scale to larger problems and leads to a significant speed-up in learning.

机译：本文研究了离散马尔可夫决策过程中表示发现的问题，即主体如何同时学习表示和最优控制。以前有关MDP的函数逼近技术的工作主要采用手工设计的基础函数。在本文中，我们探索了自动构造这些基础函数的方法，并证明了自动构造的基础函数明显优于传统的手工设计方法。我们专门研究了两个问题：如何通过明确地合并自动构建动作值函数的表示形式动作转化为表示形式，以及如何利用预定的任务层次结构自动构建表示形式。我们首先介绍一种直接在状态动作空间中学习基础函数的技术。该方法使用状态动作图的频谱分析来构造基本函数，该图捕获了MDP的状态动作空间的基础结构。我们描述了构造这些图的两种方法，并评估了具有离散状态和动作空间的MDP的方法。;我们展示了当代理访问宏动作时如何使用我们的方法来近似状态动作值函数：采取的动作超过一个时间步长并具有预定义的策略。我们描述了如何修改状态动作图以合并有关宏动作的信息，并通过实验评估了具有离散状态和动作空间的SMDP的这种方法。功能建设。我们将自动基函数构造技术扩展到多级任务层次结构，并描述基函数构造如何利用固定任务层次结构给出的价值函数分解。我们证明，将任务层次结构与自动基础函数构造相结合，可以使基础函数技术扩展到更大的问题，并显着提高学习速度。

著录项

作者
Osentoski, Sarah.;
展开▼
作者单位

University of Massachusetts Amherst.;

展开▼
授予单位 University of Massachusetts Amherst.;
学科 Computer Science.
学位 Ph.D.
年度 2009
页码 155 p.
总页数 155
原文格式 PDF
正文语种 eng
中图分类
关键词
入库时间 2022-08-17 11:38:18

相似文献

外文文献
中文文献
专利

1. Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes [J] . Mahadevan Sridhar, Maggioni Mauro Journal of machine learning research . 2007,第Oct期

机译：原型功能：一个拉普拉斯框架，用于学习马尔可夫决策过程中的表示和控制
2. Value Function Discovery in Markov Decision Processes With Evolutionary Algorithms [J] . Martijn Onderwater, Sandjai Bhulai, Rob van der Mei Systems, Man and Cybernetics, IEEE Transactions on . 2016,第9期

机译：马尔可夫决策过程中价值函数的进化算法发现
3. Learning Optimal Policies in Markov Decision Processes with Value Function Discovery [J] . Martijn Onderwater, Sandjai Bhulai, Rob van der Mei Performance evaluation review . 2015,第2期

机译：通过价值函数发现学习马尔可夫决策过程中的最优策略
4. Discovery of Optimal Solution Horizons in Non-Stationary Markov Decision Processes with Unbounded Rewards [C] . Grigory Neustroev, Mathijs de Weerdt, Remco Verzijlbergh International Conference on Automated Planning and Scheduling . 2019

机译：在非绑定奖励中发现非静止马尔可夫决策过程中最佳解决方案视野
5. Modern Methods of Hidden Markov Models and Partially Observable Markov Decision Processes in Biostatistics [D] . Xu, Zekun. 2020

机译：隐藏马尔可夫模型的现代方法和止痛性的部分可观察马尔可夫决策过程
6. Decision Making Under Uncertainty: A Neural Model Based on Partially Observable Markov Decision Processes [O] . Rajesh P. N. Rao 2010

机译：不确定性下的决策：基于部分可观察的马尔可夫决策过程的神经模型
7. Value function discovery in Markov Decision Processes with evolutionary algorithms [O] . Onderwater, M., Bhulai, S., van der Mei, R.D. 2016

机译：马尔可夫决策过程中价值函数的进化算法发现

Action-based representation discovery in Markov decision processes .

摘要

著录项

相似文献

相关主题

期刊订阅