Motivated by situations where the extreme values – as opposed to expected values in the classical stochastic multi-armed bandit (MAB) setting – are of interest, we propose a distribution-free algorithm for $textit {extreme bandits}$ and characterize its statistical properties. The proposed novel algorithm is index based, where the index is fashioned in a non-parametric way using combinatorics and robust statistics. For distributions having “exponential-like tails” and “polynomial-like tails”, we establish the following results: (i) the proposed algorithm is consistent, i.e., the index corresponding to the best arm will have the largest value asymptotically; (ii) the proposed algorithm achieves vanishing extremal regret under weaker conditions than the existing algorithms. Numerical experiments on the common class of distributions considered in the literature on extreme bandits highlight the superior finite-sample performance of the proposed algorithm compared to the state of the art.
展开▼