First-Order Bayesian Regret Analysis of Thompson Sampling

Sébastien Bubeck; Mark Sellke

首页> 外文期刊>IEEE Transactions on Information Theory >First-Order Bayesian Regret Analysis of Thompson Sampling

【24h】

First-Order Bayesian Regret Analysis of Thompson Sampling

机译：First-Order Bayesian Regret Analysis of Thompson Sampling

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

We address online combinatorial optimization when the player has a prior over the adversary’s sequence of losses. In this setting, Russo and Van Roy proposed an information theoretic analysis of Thompson Sampling based on the information ratio, allowing for elegant proofs of Bayesian regret bounds. In this paper we introduce three novel ideas to this line of work. First we propose a new quantity, the scale-sensitive information ratio, which allows us to obtain more refined first-order regret bounds (i.e., bounds of the form $O(sqrt {L^{*}})$ where $L^{*}$ is the loss of the best combinatorial action). Second we replace the entropy over combinatorial actions by a coordinate entropy, which allows us to obtain the first optimal worst-case bound for Thompson Sampling in the combinatorial setting. We additionally introduce a novel link between Bayesian agents and frequentist confidence intervals. Combining these ideas we show that the classical multi-armed bandit first-order regret bound $ widetilde {O}(sqrt {d L^{*}})$ still holds true in the more challenging and more general semi-bandit scenario. This latter result improves the previous state of the art bound $ widetilde {O}(sqrt {(d+m^{3})L^{*}})$ by Lykouris, Sridharan and Tardos. Moreover we sharpen these results with two technical ingredients. The first leverages a recent insight of Zimmert and Lattimore to replace Shannon entropy with more refined potential functions in the analysis. The second is a Thresholded Thompson Sampling algorithm, which slightly modifies the original algorithm by never playing low-probability actions. This thresholding results in fully $T$ -independent regret bounds when $L^{*}leq overline {L} ^{*}$ is almost surely upper-bounded, which we show does not hold for ordinary Thompson Sampling.

著录项

来源
《IEEE Transactions on Information Theory》 |2023年第3期|1795-1823|共29页
作者
Sébastien Bubeck; Mark Sellke;
展开▼
作者单位

Department of Mathematics, Stanford University, Stanford, CA, USA|School of Mathematics, Institute for Advanced Study, Princeton, NJ, USA;

展开▼
收录信息
原文格式 PDF
正文语种英语
中图分类通信;
关键词
Games; Entropy; Bayes methods; Mirrors; Loss measurement; Upper bound; Optimization;

First-Order Bayesian Regret Analysis of Thompson Sampling

摘要

著录项

相关主题

期刊订阅