4

Evaluating Language-Model Agents on Realistic Autonomous Tasks
We create four agents from Claude and GPT-4 to investigate the ability of frontier language models to perform autonomous replication and adaptation.
Evaluating Language-Model Agents on Realistic Autonomous Tasks