Lawrence Chan
Lawrence Chan
Home
Publications
2
The alignment problem from a deep learning perspective
We argue that AGIs trained in similar ways as today’s most capable models could learn to act deceptively to receive higher reward; learn internally-represented goals which generalize beyond their training distributions; and pursue those goals using power-seeking strategies.
Richard Ngo
,
Lawrence Chan
,
Sören Mindermann
PDF
Cite
arXiv
Benefits of assistance over reward learning
We illustrate the benefits of agents that try to assist humans, over agents that learn a reward during training and then maximize said reward after deployment.
Rohin Shah
,
Pedro Freire
,
Neel Alex
,
Rachel Freedman
,
Dmitrii Krasheninnikov
,
Lawrence Chan
,
Michael D Dennis
,
Pieter Abbeel
,
Anca Dragan
,
Stuart Russell
PDF
Cite
Code
OpenReview
Accounting for Human Learning when Inferring Human Preferences
Harry Giles
,
Lawrence Chan
PDF
Cite
arXiv
Cite
×