On the Suboptimality of Thompson Sampling in High Dimensions

December 7, 2021·
Raymond Zhang
Raymond Zhang
,
Richard Combes
Abstract
In this paper we consider Thompson Sampling (TS) for combinatorial semi-bandits. We demonstrate that, perhaps surprisingly, TS is sub-optimal for this problem in the sense that its regret scales exponentially in the ambient dimension, and its minimax regret scales almost linearly for some well chosen exemples that may not be that uncommon.
Type
Publication
In Neural Information Processing Systems 2021
publications
Raymond Zhang
Authors
PhD Student at CentraleSupélec
I was a PhD Student at the L2S lab at CentraleSupelec under the supervision of Richard Combes and Sheng Yang
Authors
Maitre de Conférence