1

Measuring AI Ability to Complete Long Software Tasks
We introduce a task-completion time horizon metric to benchmark frontier AI on software engineering tasks, finding that AI time horizons have been doubling approximately every seven months since 2019.