| COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. | ![]() |
University of Cambridge > Talks.cam > NLIP Seminar Series > What Happens When They're Smarter Than Us?
What Happens When They're Smarter Than Us?Add to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Suchir Salhan. Abstract: We are building machine learning models that increasingly outperform humans on particular tasks. I argue that this creates a particularly hard version of the principal-agent problem, in which we, the principal, have to supervise capable agents that lack our norms and incentives, while only being able to monitor a fraction of their outputs. Agents that are more capable than their principals can learn to exploit this capability gap, which creates attendant risks. I present a promising oversight strategy, the debate protocol, in which matched AI debaters argue before a weaker human judge. I sketch an analysis showing that debate is vulnerable to exploitation when the judge makes systematic errors, because the debaters can steer arguments towards the judge’s cognitive blind spots. I propose a partial remedy, debate-by-jury, in which juries of human judges oversee debates. Juries help when jurors’ errors are relatively uncorrelated, but when biases are shared across jurors, aggregation can amplify rather than correct error. Speaker Bio: Dr Konstantinos Voudouris is the cognitive scientist on the Alignment Team at the UK AI Security Institute. He holds a PhD in psychology (2024) from the University of Cambridge. His research focuses on advancing the sciences of AI alignment, scalable oversight, and AI evaluation, using tools from the cognitive sciences. Combining these diverse fields allows us to build better, safer, and more human-like AI systems, as well as informed and sensible AI policy. This talk is part of the NLIP Seminar Series series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsDepartment of Clinical Neurosciences Student Seminars Photonics Summer Seminars 2017 Museum of Archaeology & AnthropologyOther talksCausal Inference With Instrumental Variables Identification is easy, factorization is hard Eric Smith, topic TBA Machine learning & nonparametric efficiency in causal inference Geometric Rigidity from Eigenfunction Triple Products Lower and upper bounds for the magnetic lowest Dirichlet-to-Neumann eigenvalue in the strong magnetic limit |