...
首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Exploration by Optimisation in Partial Monitoring
【24h】

Exploration by Optimisation in Partial Monitoring

机译:部分监测优化探讨

获取原文
           

摘要

We provide a novel algorithm for adversarial k-action d-outcome partial monitoring that is adaptive, intuitive and efficient. The highlight is that for the non-degenerate locally observable games, the n-round minimax regret is bounded by 6m k^(3/2) sqrt(n log(k)), where m is the number of signals. This matches the best known information-theoretic upper bound derived via Bayesian minimax duality. The same algorithm also achieves near-optimal regret for full information, bandit and globally observable games. High probability bounds and simple experiments are also provided.
机译:我们提供一种新的普通k-action D-结果部分监测算法,适应性,直观和有效。突出显示是,对于非退化的本地观察游戏,N级MIMIMAX遗憾受到6M k ^(3/2)SQRT(n log(k)),其中m是信号的数量。这与通过贝叶斯Minimax二元性导出的最佳已知的信息 - 理论上界。同样的算法还实现了全面信息,强盗和全球可观察的游戏的近乎最佳的遗憾。还提供了高概率界限和简单的实验。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号