【24h】

On-Line Learning with Imperfect Monitoring

机译:具有不完善监控的在线学习

获取原文
获取原文并翻译 | 示例

摘要

We study on-line play of repeated matrix games in which the observations of past actions of the other player and the obtained reward are partial and stochastic. We define the Partial Observation Bayes Envelope (POBE) as the best reward against the worst-case stationary strategy of the opponent that agrees with past observations. Our goal is to have the (unobserved) average reward above the POBE. For the case where the observations (but not necessarily the rewards) depend on the opponent play alone, an algorithm for attaining the POBE is derived. This algorithm is based on an application of approachability theory combined with a worst-case view over the unobserved rewards. We also suggest a simplified solution concept for general signaling structure. This concept may fall short of the POBE.
机译:我们研究重复矩阵游戏的在线游戏,在该游戏中,对其他玩家过去的行为和获得的奖励的观察是部分随机的。我们将部分观察贝叶斯信封(POBE)定义为对付与过去观察一致的对手的最坏情况固定策略的最佳奖励。我们的目标是使(未观察到的)平均奖励高于POBE。对于观察结果(但不一定是奖励)取决于对手单独玩游戏的情况,推导了用于获得POBE的算法。该算法基于可接近性理论的应用,结合了对未观察到的奖励的最坏情况视图。我们还建议了一种通用信号结构的简化解决方案概念。这个概念可能不符合POBE。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号