A model based approach to exploration of continuous-state MDPs using Divergence-to-Go

机译：一种基于模型的使用发散度探索连续状态MDP的方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In reinforcement learning, exploration is typically conducted by taking occasional random actions. The literature lacks an exploration method driven by uncertainty, in which exploratory actions explicitly seek to improve the learning process in a sequential decision problem. In this paper, we propose a framework called Divergence-to-Go, which is a model-based method that uses recursion similarly to dynamic programming to quantify the uncertainty associated with each state-action pair. Information-theoretic estimators of uncertainty allow our method to function even in large, continuous spaces. The performance is demonstrated on a maze and mountain car task.

机译：在强化学习中，探索通常是通过偶尔采取随机动作来进行的。文献缺乏由不确定性驱动的探索方法，在探索方法中，探索性行动明确地寻求改善顺序决策问题中的学习过程。在本文中，我们提出了一个名为Divergence-to-Go的框架，这是一种基于模型的方法，类似于动态编程，它使用递归来量化与每个状态-动作对相关的不确定性。不确定性的信息理论估计器使我们的方法即使在较大的连续空间中也能起作用。该性能在迷宫和山区汽车任务中得到证明。

著录项

来源
《IEEE International Workshop on Machine Learning for Signal Processing》|2015年|1-6|共6页
会议地点
作者
Emigh Matthew; Kriminger Evan; Principe Jose C.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Markov processes; decision theory; dynamic programming; information theory; learning (artificial intelligence); Markov decision process; continuous-state MDPs; divergence-to-go; dynamic programming; exploration method; information-theoretic estimators; model based approach; reinforcement learning; sequential decision problem; Computational modeling; Kernel; Learning (artificial intelligence); Markov processes; Measurement uncertainty; Monte Carlo methods; Uncertainty;

机译：马尔可夫过程;决策理论;动态规划;信息论;学习（人工智能）;马尔可夫决策过程;连续状态MDPs;待发散;动态规划;探索方法;信息理论估计量;基于模型的方法;强化学习;顺序决策问题;计算建模;内核;学习（人工智能）;马尔可夫过程;测量不确定度;蒙特卡洛方法;不确定度;

相似文献

外文文献
中文文献
专利

1. A Tractable Hybrid Ddn-pomdp Approach To Affective Dialogue Modeling For Probabilisticframe-based Dialogue Systems [J] . TRUNG H. BUI, MANNES POEL, ANTON NIJHOLT, Natural language engineering . 2009,第pta2期

机译：基于概率框架的对话系统的情感对话建模的可操作混合Ddn-pomdp方法
2. An MDP Model-Based Reinforcement Learning Approach for Production Station Ramp-Up Optimization: Q-Learning Analysis [J] . Doltsinis S., Ferreira P., Lohse N. IEEE Transactions on Systems, Man, and Cybernetics . 2014,第9期

机译：基于MDP模型的强化学习平台用于生产站升级优化：Q学习分析
3. Relational approach to knowledge engineering for POMDP-based assistance systems as a translation of a psychological model [J] . Marek Grzes, Jesse Hoey, Shehroz S. Khan, International Journal of Approximate Reasoning . 2014,第1pta1期

机译：基于POMDP的辅助系统的知识工程的关系方法，是一种心理模型的翻译
4. A model based approach to exploration of continuous-state MDPs using Divergence-to-Go [C] . Emigh Matthew, Kriminger Evan, Principe Jose C. IEEE International Workshop on Machine Learning for Signal Processing . 2015

机译：基于模型的探讨探索连续状态MDPS使用发散 - 去
5. An Improved Approach of Intention Discovery with Machine Learning for POMDP-Based Dialogue Management [D] . Raval, Ruturaj Rajendrakumar 2019

机译：基于POMDP的对话管理的机器学习意图发现的改进方法
6. Exploration of Chemical Diversity and Antitrypanosomal Activity of Some Red Sea-Derived Actinomycetes Using the OSMAC Approach Supported by LC-MS-Based Metabolomics and Molecular Modelling [O] . Noha M. Gamaleldin, Walid Bakeer, Ahmed M. Sayed, 2020

机译：利用基于LC-MS基代代谢组和分子建模支持的OSMAC方法的一些红海衍生放放放放线菌的化学多样性和抗糖蛋白酶探讨
7. A tractable hybrid DDN-POMDP approach to affective dialogue modeling for probabilistic frame-based dialogue systems [O] . Trung H. Bui, Mannes Poel, Anton Nijholt 2012

机译：基于概率框架的对话系统的情感对话建模的易处理的混合DDN-POMDP方法
8. Using a Model-Based Systems Engineering Approach for exploration Medical System Development. [R] . Reilly, J., Hanson, A., Mindock, J., 2017

机译：使用基于模型的系统工程方法探索医疗系统开发。

A model based approach to exploration of continuous-state MDPs using Divergence-to-Go

摘要

著录项

相似文献

相关主题

期刊订阅