TOM ZAHAVY 

Deep Reinforcement Learning

tomzahavy (at) gmail (dot) com

          

I am a research scientist at DeepMind in the field of Reinforcement Learning. Previously, I was a Ph.D.  candidate at the Technion and interned at Microsoft, Walmart, Facebook, and Google.

 


My high-level research goal is to build an artificial intelligence via Reinforcement Learning. My prior work focused on aspects of scalability, structure discovery, hierarchy, abstraction, and exploration.

Curr
ently, I focus on Discovery:
 

  • Metagradients: building reinforcement learning algorithms that discover an internal knowledge base (hyper parameters, loss function, options, reward), in order to solve the original problem better. You can read more about it in my papers [1,2,3] in this blog post by Robert Lange, in this Podcast (with Robert, Tim, Yanick and myself), or in this talk by David Silver.  
     

  • Diversity: finding diverse policies that are also nearly optimal in terms maximising reward. I am thinking about this problem through the lens of convex optimisation [1,2,3], that is, suggesting unsupervised convex objectives that lead to diversity as well as RL algorithms that find solutions to this objective via the maximisation of reward.  

 

About me: I come from a small town in 🇮🇱 on the Mediterranean Sea. I am currently living in London 🇬🇧 and I spent some time in the 🇺🇸. My family is coming from 🇩🇪🇮🇩🇱🇺 and by DNA I am 🇮🇩🇭🇺🇮🇷(50/30/20). I am married to Gili​, a singer-songwriter from 🇮🇩🇲🇦🇮🇱. I love spending my free time outdoors in camping, hiking, 4X4 driving, mountaineering, skiing, and scuba diving. When I am at home, my hobbies are running, basketball, and reading science-fiction. 

News

​​

  • Feb 21        I was interviewed to the Machine Learning Street Talk podcast, you can listen to it here.

  • Dec 20      A blog post by Robert Lange is covering my work on meta gradients.

  • Nov 19      I joined DeepMind as a research scientist.

  • Oct 19       I finished my Ph.D. at the Technion, advised by Shie Mannor.

  • Oct 19       I finished a two year internship at Google, working with Yishay Mansour.

  • Sep 19       I've been chosen as one of the top 400 reviewers at NeurIPS.

  • Jul  19       I co-organized the third lifelong learning workshop at ICML.

Selected Publications
walker.gif

Discovering Diverse Nearly Optimal Policies with Successor Features

arXiv

Tom Zahavy, Brendan O'Donoghue, Andre Barreto, Volodymyr Mnih, Sebastian Flennerhag, Satinder Singh

where_do_rewards_come.png

Reward is enough for convex MDPs 

arXiv

walker_gif.gif

Discovering a set of policies for the worst case reward

ICLR 2021 (spotlight)

A Self-Tuning Actor-Critic Algorithm

NEURIPS 2020

Tom Zahavy, Zhongwen Xu, Vivek Veeriah, Matteo Hessel, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh

A Deep Hierarchical Approach to Lifelong Learning in Minecraft

AAAI 2017

Chen Tessler, Shahar Givony, Tom Zahavy, Daniel J Mankowitz, Shie Mannor

Fig8_Pacman_V_colored.png

Graying the black box: Understanding DQNs

ICML 2016

Tom Zahavy, Nir Ben Zrihem, Shie Mannor

neural-linea.png

Online Limited Memory Neural-Linear Bandits with Likelihood Matching

ICML 2021

Inverse Reinforcement Learning in Contextual MDPs

SPRINGER, Machine Learning Journal 2021, Special Issue On RL for Real Life

Stav Belogolovsky, Philip Korsunsky, Shie Mannor, Chen Tessler, Tom Zahavy

Apprenticeship Learning via Frank-Wolfe

AAAI 2020

Tom Zahavy, Alon Cohen, Haim Kaplan, and Yishay Mansour

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

NeurIPS 2018

Tom Zahavy, Matan Haroush, Nadav Merlis, Daniel J. Mankowitz, Shie Mannor

Balancing Constraints and Rewards with Meta-Gradient D4PG

ICLR 2021

Shallow Updates for Deep Reinforcement Learning

NeurIPS 2017

Planning in Hierarchical Reinforcement Learning: Guarantees for Using Local Policies

ALT 2020

Tom Zahavy, Avinatan Hasidim, Haim Kaplan and Yishay Mansour