Eva Tardos / Cornell University
Over the last two decades we have developed good understanding how to quantify the impact of strategic user behavior on outcomes in many games (including traffic routing and online auctions) and showed that the resulting bounds extend to repeated games assuming players use a form of no-regret learning to adapt to the environment. Unfortunately, these results do not apply when outcomes in one round effect the game in the future, as is the case in many applications. In this talk, we study this phenomenon in the context of a game modeling queuing systems: routers compete for servers, where packets that do not get served need to be resent, resulting in a system where the number of packets at each round depends on the success of the routers in the previous rounds. In joint work with Jason Gaitonde, we analyze the resulting highly dependent random process. We find that if the capacity of the servers is high enough to allow a centralized and knowledgeable scheduler to get all packets served even with double the packet arrival rate, then despite selfish behavior of the queues, the expected number of packets in the queues will remain bounded throughout time, assuming older packets have priority. Further, if queues are more patient in evaluating their outcomes , maximizing their long-run success rate, stability can be ensured with just 1.58 times extra capacity, strictly better than what is possible assuming the no-regret property.
David Silver / DeepMind & University College London
In this talk I will discuss the value equivalence principle for model-based reinforcement learning. This principle asserts that the environment may be substituted by any model that produces the same values. Value-equivalent models may be more efficient to learn because they ignore aspects of the environment that are irrelevant in terms of value and hence unnecessary for optimal planning. I will also explain how the value equivalence principle is used in MuZero, an algorithm that has achieved state-of-the-art results in chess, Go and Atari.
Sara van de Geer / Cornell University
We consider a linear regression model with p variables and n observations, and with p much larger than n. There are then (typically) several interpolators of the data. It is well-known that the minimum l2-norm interpolator has a large bias. When the model is sparse, the minimum l1-norm interpolator can have l2-error about as small as the noise level. We will discuss this and then look at the case where one only observes the sign of the regression. We (re-)establish rates of convergence for two minimum l1-norm interpolators of the signs: the first one is under an l2-restriction and the second one is the max-margin classifier related to the ada-boost algorithm.
Persi Diaconis / Stanford University
Despite hostility from the theoretical CS community, sequential importance sampling remains a widely used (and useful) family of algorithms. In joint work with Yeganeh Alimohammadi, Mohamad Roghani, and Amin Saberi, we prove polynomial time guarantees for bipartite matching for dense graphs. It's utility is illustrated for three real world counting and generation problems: Generating Latin squares, card guessing experiments with feedback and stochastic block models. Our implementations illustrate the strengths and weaknesses of Sinkhorn scaling for evaluating the permanent.