COLT 2021: Value Equivalent Reinforcement Learning

Value Equivalent Reinforcement Learning

David Silver / DeepMind & University College London

Abstract: In this talk I will discuss the value equivalence principle for model-based reinforcement learning. This principle asserts that the environment may be substituted by any model that produces the same values. Value-equivalent models may be more efficient to learn because they ignore aspects of the environment that are irrelevant in terms of value and hence unnecessary for optimal planning. I will also explain how the value equivalence principle is used in MuZero, an algorithm that has achieved state-of-the-art results in chess, Go and Atari.

Bio: David Silver is a principal research scientist at DeepMind and a professor at University College London. David’s work focuses on artificially intelligent agents based on reinforcement learning. David co-led the project that combined deep learning and reinforcement learning to play Atari games directly from pixels (Nature 2015). He also led the AlphaGo project, culminating in the first program to defeat a top professional player in the full-size game of Go (Nature 2016), and the AlphaZero project, which learned by itself to defeat the world’s strongest chess, shogi and Go programs (Nature 2017, Science 2018). Most recently he co-led the AlphaStar project, which led to the world’s first grandmaster level StarCraft player (Nature 2019). His work has been recognised by the Marvin Minsky award, Mensa Foundation Prize, and Royal Academy of Engineering Silver Medal.