On the Suboptimality of Thompson Sampling in High Dimensions

Abstract
In this paper we consider Thompson Sampling (TS) for combinatorial
semi-bandits. We demonstrate that, perhaps surprisingly, TS is sub-optimal for
this problem in the sense that its regret scales exponentially in the ambient
dimension, and its minimax regret scales almost linearly for some well chosen
exemples that may not be that uncommon.
Type
Publication
In Neural Information Processing Systems 2021

Authors
PhD Student at CentraleSupélec
I was a PhD Student at the L2S lab at CentraleSupelec under the supervision of
Richard Combes and
Sheng Yang