This version of Talks.cam will be replaced by 1 July 2026, further information is available on the UIS Help Site
 

University of Cambridge > Talks.cam > NLIP Seminar Series > What Happens When They're Smarter Than Us?

What Happens When They're Smarter Than Us?

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Suchir Salhan.

Abstract: We are building machine learning models that increasingly outperform humans on particular tasks. I argue that this creates a particularly hard version of the principal-agent problem, in which we, the principal, have to supervise capable agents that lack our norms and incentives, while only being able to monitor a fraction of their outputs. Agents that are more capable than their principals can learn to exploit this capability gap, which creates attendant risks. I present a promising oversight strategy, the debate protocol, in which matched AI debaters argue before a weaker human judge. I sketch an analysis showing that debate is vulnerable to exploitation when the judge makes systematic errors, because the debaters can steer arguments towards the judge’s cognitive blind spots. I propose a partial remedy, debate-by-jury, in which juries of human judges oversee debates. Juries help when jurors’ errors are relatively uncorrelated, but when biases are shared across jurors, aggregation can amplify rather than correct error.

Speaker Bio: Dr Konstantinos Voudouris is the cognitive scientist on the Alignment Team at the UK AI Security Institute. He holds a PhD in psychology (2024) from the University of Cambridge. His research focuses on advancing the sciences of AI alignment, scalable oversight, and AI evaluation, using tools from the cognitive sciences. Combining these diverse fields allows us to build better, safer, and more human-like AI systems, as well as informed and sensible AI policy.

This talk is part of the NLIP Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2026 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity