3

HCAST: Human-Calibrated Autonomy Software Tasks

We present HCAST, a benchmark of 189 tasks with over 1500 hours of human baselines, finding that current AI agents succeed 70-80% on tasks taking humans less than one hour but under 20% on tasks taking more than 4 hours.

David Rein, Joel Becker, Amy Deng, Seraphina Nix, Chris Canal, Daniel O'Connel, Pip Arnott, Ryan Bloom, Thomas Broadley, Katharyn Garcia, Brian Goodrich, Max Hasin, Sami Jawhar, Megan Kinniment, Thomas Kwa, Aron Lajko, Nate Rush, Lucas Jun Koba Sato, Sydney Von Arx, Ben West, Lawrence Chan, Elizabeth Barnes

Modular addition without black-boxes: Compressing explanations of MLPs that compute numerical integration

We present the first case study in rigorously compressing nonlinear feature-maps, discovering that the MLP in modular addition models can be understood as evaluating a quadrature scheme.

Chun Hei Yip, Rajashree Agrawal, Lawrence Chan, Jason Gross

Human irrationality: both bad and good for reward inference

We study the effect of human irrationality on reward inference. We find that irrationality can both help and hurt reward inference, depending on the type of irrationality. We also find that the effect of irrationality on reward inference is not monotonic in the degree of irrationality.

Lawrence Chan, Andrew Critch, Anca Dragan

Benefits of assistance over reward learning

We illustrate the benefits of agents that try to assist humans, over agents that learn a reward during training and then maximize said reward after deployment.

Rohin Shah, Pedro Freire, Neel Alex, Rachel Freedman, Dmitrii Krasheninnikov, Lawrence Chan, Michael D Dennis, Pieter Abbeel, Anca Dragan, Stuart Russell

Accounting for Human Learning when Inferring Human Preferences

Inverse reinforcement learning (IRL) is a powerful tool for learning reward functions from demonstrations. However, standard IRL …

Harry Giles, Lawrence Chan