Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Evaluating and Regulating Foundation Models

Add to your list(s) Download to your calendar using vCal

Miri Zilka, Neel Alex, Shoaib Ahmed Siddiqui, University of Cambridge
Wednesday 21 May 2025, 11:00-12:30
Cambridge University Engineering Department, CBL Seminar room BE4-38..

If you have a question about this talk, please contact .

Teams link available upon request (it is sent out on our mailing list, eng-mlg-rcc [at] lists.cam.ac.uk). Sign up to our mailing list for easier reminders via lists.cam.ac.uk.

The emergence of foundation models and generalist AI systems has transformed the landscape of evaluation, introducing complex challenges that go far beyond the closed-domain settings of the past. This reading group aims to explore cutting-edge approaches for assessing these open-domain systems, with an emphasis on both technical evaluation strategies and evolving regulatory frameworks. We will begin by examining the unique difficulties of evaluating open-domain models, considering possible solutions and highlighting the risks of metric manipulation by resourceful actors. Next, we will discuss the methodologies employed by frontier labs for internal evaluation, as well as the interplay between technical validation and policy-driven oversight. Finally, we will explore evaluation in the context of human-machine collaboration, analyzing the challenges of measuring performance and alignment in systems with humans in the loop.

This talk is part of the Machine Learning Reading Group @ CUED series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Evaluating and Regulating Foundation Models

This talk is included in these lists:

Other lists

Other talks