Accounting for Human Learning when Inferring Human Preferences

Abstract

Inverse reinforcement learning (IRL) is a powerful tool for learning reward functions from demonstrations. However, standard IRL assumes that the human demonstrator is stationary; that is, the human’s policy does not change during the demonstrations. In this paper, we study IRL when the human is learning – that is, the human’s policy improves over the course of the demonstrations. We show that observing a learning human can be more informative about their preferences than observing a human with a fixed policy, and that standard IRL techniques perform poorly when the human is learning in an unfamiliar environment.

Publication
NeurIPS 2020 HAMLETS Workshop
Lawrence Chan
Lawrence Chan
PhD Candidate

I do AI Alignment research.