4

Evaluating Language-Model Agents on Realistic Autonomous Tasks

We create four agents from Claude and GPT-4 to investigate the ability of frontier language models to perform autonomous replication and adaptation.

Megan Kinniment, Lucas Jun Koba Sato, Haoxing Du, Brian Goodrich, Max Hasin, Lawrence Chan, Luke Harold Miles, Tao R Lin, Hjalmar Wijk, Joel Burget, Aaron Ho, Elizabeth Barnes, Paul Christiano

Evaluating Language-Model Agents on Realistic Autonomous Tasks