Abstract: A common technique in supervised learning with discrete losses, such as 0-1 loss, is to optimize a convex surrogate loss over \reals^d, calibrated with respect to the original loss. In particular, recent work has investigated embedding the original predictions (e.g. labels) as points in \reals^d, showing an equivalence to using polyhedral surrogates. In this work, we study the notion of the embedding dimension of a given discrete loss: the minimum dimension $d$ such that an embedding exists. We characterize d-embeddability for all $d$, with a particularly tight characterization for $d=1$ (embedding into the real line), and useful necessary conditions for d>1 in the form of a quadratic feasibility program. We illustrate our results with novel lower bounds for abstain loss.

Summary presentation

Full presentation