On the Suboptimality of Thompson Sampling in High Dimensions

Dec 7, 2021·

Raymond Zhang

Richard Combes

· 0 min read

Abstract

In this paper we consider Thompson Sampling (TS) for combinatorial semi-bandits. We demonstrate that, perhaps surprisingly, TS is sub-optimal for this problem in the sense that its regret scales exponentially in the ambient dimension, and its minimax regret scales almost linearly for some well chosen exemples that may not be that uncommon.

Type

Conference Paper

Publication

In Neural Information Processing Systems 2021

Last updated on Dec 6, 2024

Authors

Raymond Zhang

PhD Student at CentraleSupélec

← Thompson Sampling For Combinatorial Bandits: Polynomial Regret and Mismatched Sampling Paradox Dec 9, 2024