Abstract: We provide a novel algorithm for adversarial k-action d-outcome partial monitoring that is adaptive, intuitive and efficient. The highlight is that for the non-degenerate locally observable games, the n-round minimax regret is bounded by 2mk^(3/2)sqrt(3n log(k)), where m is the number of signals. This matches the best known information-theoretic upper bound derived via Bayesian minimax duality. The same algorithm also achieves near-optimal regret for full information, bandit and globally observable games. High probability bounds and simple experiments are also provided.

Summary presentation

Full presentation