University of Cambridge > Talks.cam > Machine Learning Reading Group @ CUED > Evaluating and Regulating Foundation Models

Evaluating and Regulating Foundation Models

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact .

Teams link available upon request (it is sent out on our mailing list, eng-mlg-rcc [at] lists.cam.ac.uk). Sign up to our mailing list for easier reminders via lists.cam.ac.uk.

The emergence of foundation models and generalist AI systems has transformed the landscape of evaluation, introducing complex challenges that go far beyond the closed-domain settings of the past. This reading group aims to explore cutting-edge approaches for assessing these open-domain systems, with an emphasis on both technical evaluation strategies and evolving regulatory frameworks. We will begin by examining the unique difficulties of evaluating open-domain models, considering possible solutions and highlighting the risks of metric manipulation by resourceful actors. Next, we will discuss the methodologies employed by frontier labs for internal evaluation, as well as the interplay between technical validation and policy-driven oversight. Finally, we will explore evaluation in the context of human-machine collaboration, analyzing the challenges of measuring performance and alignment in systems with humans in the loop.

This talk is part of the Machine Learning Reading Group @ CUED series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2025 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity