Tsallis-INF: Improved Robustness to Adversarial Corruptions inStochastic Multi-armed Bandits and Beyond

Saeed Masoudian , Yevgeny Seldin

[Proceedings link] [PDF]

Session: Bandits, RL and Control 2 (B)

Session Chair: Sattar Vakili

Poster: Poster Session 3

Abstract: We derive improved regret bounds for the Tsallis-INF algorithm of \citet{zimmert2020}. We show that in adversarial regimes with a $(\Delta,C,T)$ self-bounding constraint the algorithm achieves $\Ocal\lr{\lr{\sum_{i\neq i^*} \frac{1}{\Delta_i}}\log_+\lr{\frac{(K-1)T}{\lr{\sum_{i\neq i^*} \frac{1}{\Delta_i}}^2}}+\sqrt{C\lr{\sum_{i\neq i^*}\frac{1}{\Delta_i}}\log_+\lr{\frac{(K-1)T}{C\sum_{i\neq i^*}\frac{1}{\Delta_i}}}}}$ regret bound, where $T$ is the time horizon, $K$ is the number of arms, $\Delta_i$ are the suboptimality gaps, $i^*$ is the best arm, $C$ is the corruption magnitude, and $\log_+(x) = \max\lr{1,\log x}$. The regime includes stochastic bandits, stochastically constrained adversarial bandits, and stochastic bandits with adversarial corruptions as special cases. Additionally, we provide a general analysis, which allows to achieve the same kind of improvement for generalizations of Tsallis-INF to other settings beyond multiarmed bandits.

Summary presentation

Full presentation