Coordination without communication: optimal regret in two players multi-armed bandits

Sebastien Bubeck, Thomas Budzinski

[Proceedings link] [PDF]

Subject areas: Bandit problems,

Presented in: Session 3B, Session 3D

[Zoom link for poster in Session 3B], [Zoom link for poster in Session 3D]

Abstract: We consider two agents playing simultaneously the same stochastic three-armed bandit problem. The two agents are cooperating but they cannot communicate. Under the assumption that shared randomness is available, we propose a strategy with no collisions at all between the players (with very high probability), and with near-optimal regret $O(\sqrt{T \log(T)})$. We also argue that the extra logarithmic term $\sqrt{\log(T)}$ should be necessary by proving a lower bound for a full information variant of the problem.

Summary presentation

Full presentation