Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Influence Functions

Add to your list(s) Download to your calendar using vCal

Adrian Goldwaser, Bruno Mlodozeniec, Runa Eschenhagen, University of Cambridge
Wednesday 05 March 2025, 11:00-12:30
Cambridge University Engineering Department, CBL Seminar room BE4-38..

If you have a question about this talk, please contact .

Teams link available upon request (it is sent out on our mailing list, eng-mlg-rcc [at] lists.cam.ac.uk). Sign up to our mailing list for easier reminders via lists.cam.ac.uk.

When attempting to understand the behaviour of a machine learning model, a common question is: how did the training examples contribute to a model output? Which examples contributed the most? This can also be framed as a counterfactual question: how would the final model outputs change upon removal of some examples from the training set? The goal of training data attribution (TDA) methods like influence functions, which will be the subject of this talk, is to answer precisely this question. In this talk, we will give an introduction to influence functions, discuss challenges and approaches to scalability, and give examples of practical applications. We will show that solving the aforementioned data attribution problem can be extremely useful. It can help identify pernicious data – from mislabelled examples, data responsible for undesirable behaviours (e.g. profanity or explicit content) through to data poisoning attacks. Influence functions can help understand memorisation in neural networks, providing mitigations to privacy and copyright concerns, along with fair data valuation. Influence functions can answer the above TDA problem efficiently without retraining, using only the local information about the training loss function around the final model parameters. They have been successfully used for these tasks for models ranging from 50 billion parameter Large Language Models to modern diffusion models.

This talk is part of the Machine Learning Reading Group @ CUED series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Influence Functions

This talk is included in these lists:

Other lists

Other talks