首页> 外文期刊>IEEE Transactions on Automatic Control >Satisficing in Multi-Armed Bandit Problems
【24h】

Satisficing in Multi-Armed Bandit Problems

机译:满足多武装强盗问题

获取原文
获取原文并翻译 | 示例
           

摘要

Satisficing is a relaxation of maximizing and allows for less risky decision making in the face of uncertainty. We propose two sets of satisficing objectives for the multi-armed bandit problem, where the objective is to achieve reward-based decision-making performance above a given threshold. We show that these new problems are equivalent to various standard multi-armed bandit problems with maximizing objectives and use the equivalence to find bounds on performance. The different objectives can result in qualitatively different behavior; for example, agents explore their options continually in one case and only a finite number of times in another. For the case of Gaussian rewards we show an additional equivalence between the two sets of satisficing objectives that allows algorithms developed for one set to be applied to the other. We then develop variants of the Upper Credible Limit (UCL) algorithm that solve the problems with satisficing objectives and show that these modified UCL algorithms achieve efficient satisficing performance.
机译:满足是最大程度的放松,并且在面对不确定性时允许风险较小的决策。对于多臂匪问题,我们提出了两个令人满意的目标,即目标是在给定阈值以上实现基于奖励的决策绩效。我们证明这些新问题等同于各种目标最大化的标准多臂强盗问题,并使用等价关系来寻找性能的界限。不同的目标可能导致行为发生质的变化。例如,代理在一种情况下会不断探索其选择,而在另一种情况下则只会探索有限的次数。对于高斯奖励,我们展示了两组令人满意的目标之间的额外等价关系,这使得为一组开发的算法可以应用于另一组。然后,我们开发了可信上限(UCL)算法的变体,该变体解决了满足满意目标的问题,并表明这些修改后的UCL算法可实现有效的满意性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号