Deep Reinforcement Learning

tomzahavy (at) gmail (dot) com


I am a research scientist at DeepMind in the field of Reinforcement Learning. Previously, I was a Ph.D.  candidate at the Technion and interned at Microsoft, Walmart, Facebook, and Google.


My high-level research goal is to build an artificial intelligence via Reinforcement Learning. My prior work focused on aspects of scalability, structure discovery, hierarchy, abstraction, and exploration.

ently, I focus on Discovery:

  • Metagradients: building reinforcement learning algorithms that discover an internal knowledge base (hyper parameters, loss function, options, reward), in order to solve the original problem better. You can read more about it in my papers [1,2,3] in this blog post by Robert Lange, in this Podcast (with Robert, Tim, Yanick and myself), or in this talk by David Silver.  

  • Diversity: finding diverse policies that are also nearly optimal in terms maximising reward. I am thinking about this problem through the lens of convex optimisation [1,2,3], that is, suggesting unsupervised convex objectives that lead to diversity as well as RL algorithms that find solutions to this objective via the maximisation of reward.  


About me: I come from a small town in 🇮🇱 on the Mediterranean Sea. I am currently living in London 🇬🇧 and I spent some time in the 🇺🇸. My family is coming from 🇩🇪🇮🇩🇱🇺 and by DNA I am 🇮🇩🇭🇺🇮🇷(50/30/20). I am married to Gili​, a singer-songwriter from 🇮🇩🇲🇦🇮🇱. I love spending my free time outdoors in camping, hiking, 4X4 driving, mountaineering, skiing, and scuba diving. When I am at home, my hobbies are running, basketball, and reading science-fiction. 



  • Sep 21       Two papers [spotlight, poster] accepted at NeurIPS2021. 

  • May 21      Two papers [poster, poster] accepted at ICML2021. 

  • Feb 21       I was interviewed to the Machine Learning Street Talk podcast, you can listen to it here.

  • Jan 21       Two papers [spotlight, poster] accepted at ICRL2021. 

  • Dec 20      A blog post by Robert Lange is covering my work on meta gradients.

  • Sep 20      One paper accepted at NeurIPS2020. 

  • Nov 19      I joined DeepMind as a research scientist.

  • Oct 19       I finished my Ph.D. at the Technion, advised by Shie Mannor.

  • Oct 19       I finished a two year internship at Google, working with Yishay Mansour.

  • Sep 19       I've been chosen as one of the top 400 reviewers at NeurIPS.

  • Jul  19       I co-organized the third lifelong learning workshop at ICML.

Selected Publications

Discovering Diverse Nearly Optimal Policies with Successor Features


Tom Zahavy, Brendan O'Donoghue, Andre Barreto, Volodymyr Mnih, Sebastian Flennerhag, Satinder Singh


Reward is enough for convex MDPs 



Discovering a set of policies for the worst case reward

ICLR 2021 (spotlight)


A Self-Tuning Actor-Critic Algorithm


Tom Zahavy, Zhongwen Xu, Vivek Veeriah, Matteo Hessel, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh

minecraft_lifelong copy_edited.png

A Deep Hierarchical Approach to Lifelong Learning in Minecraft

AAAI 2017

Chen Tessler, Shahar Givony, Tom Zahavy, Daniel J Mankowitz, Shie Mannor


Graying the black box: Understanding DQNs

ICML 2016

Tom Zahavy, Nir Ben Zrihem, Shie Mannor


Online Limited Memory Neural-Linear Bandits with Likelihood Matching

ICML 2021


Inverse Reinforcement Learning in Contextual MDPs

SPRINGER, Machine Learning Journal 2021, Special Issue On RL for Real Life

Stav Belogolovsky, Philip Korsunsky, Shie Mannor, Chen Tessler, Tom Zahavy


Apprenticeship Learning via Frank-Wolfe

AAAI 2020

Tom Zahavy, Alon Cohen, Haim Kaplan, and Yishay Mansour


Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

NeurIPS 2018

Tom Zahavy, Matan Haroush, Nadav Merlis, Daniel J. Mankowitz, Shie Mannor


Balancing Constraints and Rewards with Meta-Gradient D4PG

ICLR 2021


Shallow Updates for Deep Reinforcement Learning

NeurIPS 2017

Screen Shot 2018-03-19 at 17.04.18.png

Planning in Hierarchical Reinforcement Learning: Guarantees for Using Local Policies

ALT 2020

Tom Zahavy, Avinatan Hasidim, Haim Kaplan and Yishay Mansour