Conference Paper

Thompson Sampling For Combinatorial Bandits: Polynomial Regret and Mismatched Sampling Paradox featured image

Thompson Sampling For Combinatorial Bandits: Polynomial Regret and Mismatched Sampling Paradox

We show that the lower bound for Linear Bandit for ellipsoidal action set is $\Omega(\min(d \sigma \sqrt{T} + d \|\theta\|_{A}, \|\theta\|_{A} T))$ and that for any algorithm hard …

avatar
Raymond Zhang
Thompson Sampling For Combinatorial Bandits: Polynomial Regret and Mismatched Sampling Paradox featured image

Thompson Sampling For Combinatorial Bandits: Polynomial Regret and Mismatched Sampling Paradox

We propose the first known Thompson Sampling for combinatorial bandits whose finite-time regret does not scale exponentially with the dimension of the problem. Suprisingly, …

avatar
Raymond Zhang
On the Suboptimality of Thompson Sampling in High Dimensions featured image

On the Suboptimality of Thompson Sampling in High Dimensions

Thompson Sampling (TS) for combinatorial semi-bandits is sub-optimal in the sense that its regret scales exponentially in the ambient dimension for some well chosen exemples.

avatar
Raymond Zhang