Lawrence Chan

Lawrence Chan

PhD Candidate

Berkeley AI Research

As of January 2023, I’m currently working at the Alignment Research Center doing evaluations of large language models. Previously, I was at Redwood Research, where I worked on adversarial training and neural network interpretability.

I’m also doing a PhD at UC Berkeley advised by Anca Dragan and Stuart Russell. Before that, I received a BAS in Computer Science and Logic and a BS in Economics from the University of Pennsylvania’s M&T Program, where I was fortunate to work with Philip Tetlock on using ML for forecasting.

My main research interests are mechanistic interpretability and scalable oversight. In the past, I’ve also done conceptual work on learning human values.

I also sometimes blog about AI alignment and other topics on LessWrong/the AI Alignment Forum.

Recent Publications

Evaluating Language-Model Agents on Realistic Autonomous Tasks
Evaluating Language-Model Agents on Realistic Autonomous Tasks

We create four agents from Claude and GPT-4 to investigate the ability of frontier language models to perform autonomous replication and adaptation.