Improved Regret for Zeroth-Order Stochastic Convex Bandits

Tor Lattimore , Andras Gyorgy

[Proceedings link] [PDF]

Session: Bandits, RL and Control 1 (A)

Session Chair: Yuxin Chen

Poster: Poster Session 2

Abstract: We present an efficient algorithm for stochastic bandit convex optimisation with no assumptions on smoothness or strong convexity and for which the regret is bounded by O(d^(4.5) sqrt(n) polylog(n)), where n is the number of interactions and d is the dimension.

Summary presentation

Full presentation