Active Exploration and Learning in Real-Valued Spaces using Multi-Armed Bandit Allocation Indices

机译：使用多武装强盗分配指标的实际空间中的积极探索与学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A method for active learning is introduced which uses Gittins multi-armed bandit allocation indices to select actions which optimally trade-off exploration and exploitation to maximize expected payoff. We apply the Gittins method to continuous action spaces by using the C4.5 algorithm to learn a mapping from state (or perception of state) and action to the success or failure of the action when taken in the state. The leaves of the resulting tree form a finite set of alternatives over the continuous space of actions. The action selected is from that leaf which, of the leaves consistent with the perceived state, has the highest Gittins index. We illustrate the technique with a simulated robot learning task for grasping objects where each grasping trial can be lengthy and it is desirable to reduce unnecessary experiments. For the grasping simulation, the Gittins index approach demonstrates statistically significant performance improvement over the Interval Estimation action selection heuristic, with little increase in computational cost. The method also has the advantage of providing a principled way of choosing the exploration parameter based on the expected number of repetitions of the task.

机译：介绍了一种主动学习的方法，它使用Gittins多武装强盗分配指标来选择最佳地权衡探索和剥削以最大化预期收益的动作。我们通过使用C4.5算法将Gittins方法应用于连续动作空间，从状态（或状态）映射和在状态下采取行动的成功或失败的操作来学习映射。所得到的树的叶子在连续的动作空间上形成有限的替代方案。选择的动作是从那些与感知状态一致的叶片的叶子具有最高的Gittins指数。我们说明了一种具有模拟机器人学习任务的技术，用于抓住每个抓握试验的抓住物体可以冗长，并且希望减少不必要的实验。对于掌握模拟，Gittins指数方法在间隔估计动作选择启发式上表现出统计上显着的性能改进，计算成本几乎没有提高。该方法还具有提供基于任务的预期重复次数选择探索参数的原则方式的优点。

著录项

来源
《International conference on machine learning》|1995年||共8页
会议地点
作者
Marcos Salganicoff; Lyle H. Ungar;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类教学机、学习机;
关键词

相似文献

外文文献
中文文献
专利

1. Rethinking the Gold Standard With Multi-armed Bandits: Machine Learning Allocation Algorithms for Experiments [J] . Kaibel Chris, Biemann Torsten Organizational Research Methods . 2021,第1期

机译：用多武装燃烧的金标：实验的机器学习分配算法
2. Game-based hierarchical multi-armed bandit learning algorithm for joint channel and power allocation in underwater acoustic communication networks [J] . Han Song, Li Xinbin, Yan Lei, Neurocomputing . 2018,第MAY10期

机译：基于游戏的水下声通信网络中联合信道和功率分配的分层多臂匪学习算法
3. A Novel Active Optimization Approach for Rapid and Efficient Design Space Exploration Using Ensemble Machine Learning [J] . Opeoluwa Owoyele, Pinaki Pal Journal of Energy Resources Technology . 2021,第3期

机译：一种新的积极优化方法，用于快速高效的设计空间探索使用集合机学习
4. Active Exploration and Learning in Real-Valued Spaces using Multi-Armed Bandit Allocation Indices [C] . Marcos Salganicoff, Lyle H. Ungar International conference on machine learning . 1995

机译：使用多武装强盗分配指标的实际空间中的积极探索与学习
5. Multi-Armed Bandits for Preference Learning [D] . Katariya, Sumeet. 2018

机译：偏好学习的多武装土匪
6. Anytime Exploration for Multi-armed Bandits using ConfidenceInformation [O] . Kwang-Sung Jun, Robert Nowak -1

机译：随时随地探索多臂匪信息
7. Sleeping Multi-Armed Bandit Learning for Fast Uplink Grant Allocation in Machine Type Communications [O] . Samad Ali, Aidin Ferdowsi, Walid Saad, 2020

机译：睡眠多武器匪管学习，用于机器类型通信中的快速上行链路授权分配

Active Exploration and Learning in Real-Valued Spaces using Multi-Armed Bandit Allocation Indices

摘要

著录项

相似文献

相关主题

期刊订阅