首页> 外文会议>International conference on machine learning >Active Exploration and Learning in Real-Valued Spaces using Multi-Armed Bandit Allocation Indices
【24h】

Active Exploration and Learning in Real-Valued Spaces using Multi-Armed Bandit Allocation Indices

机译:使用多武装强盗分配指标的实际空间中的积极探索与学习

获取原文

摘要

A method for active learning is introduced which uses Gittins multi-armed bandit allocation indices to select actions which optimally trade-off exploration and exploitation to maximize expected payoff. We apply the Gittins method to continuous action spaces by using the C4.5 algorithm to learn a mapping from state (or perception of state) and action to the success or failure of the action when taken in the state. The leaves of the resulting tree form a finite set of alternatives over the continuous space of actions. The action selected is from that leaf which, of the leaves consistent with the perceived state, has the highest Gittins index. We illustrate the technique with a simulated robot learning task for grasping objects where each grasping trial can be lengthy and it is desirable to reduce unnecessary experiments. For the grasping simulation, the Gittins index approach demonstrates statistically significant performance improvement over the Interval Estimation action selection heuristic, with little increase in computational cost. The method also has the advantage of providing a principled way of choosing the exploration parameter based on the expected number of repetitions of the task.
机译:介绍了一种主动学习的方法,它使用Gittins多武装强盗分配指标来选择最佳地权衡探索和剥削以最大化预期收益的动作。我们通过使用C4.5算法将Gittins方法应用于连续动作空间,从状态(或状态)映射和在状态下采取行动的成功或失败的操作来学习映射。所得到的树的叶子在连续的动作空间上形成有限的替代方案。选择的动作是从那些与感知状态一致的叶片的叶子具有最高的Gittins指数。我们说明了一种具有模拟机器人学习任务的技术,用于抓住每个抓握试验的抓住物体可以冗长,并且希望减少不必要的实验。对于掌握模拟,Gittins指数方法在间隔估计动作选择启发式上表现出统计上显着的性能改进,计算成本几乎没有提高。该方法还具有提供基于任务的预期重复次数选择探索参数的原则方式的优点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号