首页> 外文会议>IEEE International Workshop on Machine Learning for Signal Processing >A model based approach to exploration of continuous-state MDPs using Divergence-to-Go
【24h】

A model based approach to exploration of continuous-state MDPs using Divergence-to-Go

机译:一种基于模型的使用发散度探索连续状态MDP的方法

获取原文

摘要

In reinforcement learning, exploration is typically conducted by taking occasional random actions. The literature lacks an exploration method driven by uncertainty, in which exploratory actions explicitly seek to improve the learning process in a sequential decision problem. In this paper, we propose a framework called Divergence-to-Go, which is a model-based method that uses recursion similarly to dynamic programming to quantify the uncertainty associated with each state-action pair. Information-theoretic estimators of uncertainty allow our method to function even in large, continuous spaces. The performance is demonstrated on a maze and mountain car task.
机译:在强化学习中,探索通常是通过偶尔采取随机动作来进行的。文献缺乏由不确定性驱动的探索方法,在探索方法中,探索性行动明确地寻求改善顺序决策问题中的学习过程。在本文中,我们提出了一个名为Divergence-to-Go的框架,这是一种基于模型的方法,类似于动态编程,它使用递归来量化与每个状态-动作对相关的不确定性。不确定性的信息理论估计器使我们的方法即使在较大的连续空间中也能起作用。该性能在迷宫和山区汽车任务中得到证明。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号