On the Suboptimality of Thompson Sampling in High Dimensions


In this paper we consider Thompson Sampling (TS) for combinatorial semi-bandits. We demonstrate that, perhaps surprisingly, TS is sub-optimal for this problem in the sense that its regret scales exponentially in the ambient dimension, and its minimax regret scales almost linearly for some well chosen exemples that may not be that uncommon.

In Neural Information Processing Systems 2021
Raymond Zhang
Raymond Zhang
PhD Student at CentraleSupélec(L2S)

My current research interests include Bandits, Measure Concentration and Combinatorial Optimisation