University of Cambridge > Talks.cam > Language Technology Lab Seminars > Efficient Retrieval of Influential LLM Training Examples

Efficient Retrieval of Influential LLM Training Examples

Download to your calendar using vCal

If you have a question about this talk, please contact Lucas Resck.

This talk has been canceled/deleted

Abstract: Attributing LLM outputs to the training examples that causally influence their behavior can give us visibility into LLMs’ opaque reasoning and help us understand subtle persona changes. Unfortunately, finding training data attribution algorithms which are both accurate and scalable has remained an elusive goal. I argue for separately studying an Estimation Problem (accurately estimating the causal effect of a training example) and a Retrieval Problem (efficiently finding the highest-scoring training examples). I then present a generic retrieval method for influential sequences which can be paired with a wide range of influence estimators (including EKFAC ) and for which one can obtain high confidence about recall. I discuss how causal training data attribution can be used as a tool to assure LLM alignment.

Bio: Roger is an Associate Professor of Computer Science at the University of Toronto, Schwartz Reisman Chair in Technology and Society, and a founding member of the Vector Institute. He is also a Member of Technical Staff on the Alignment Science Team at Anthropic, where his work focuses on training data attribution. He holds a Schmidt Sciences AI2050 Senior Fellowship, Sloan Fellowship, and Canada CIFAR AI Chair. His research has focused on better understanding neural net training dynamics, and using this understanding to improve training speed, generalization, uncertainty estimation, and automatic hyperparameter tuning. He’s now focusing on applying our understanding of deep learning to AI alignment. Given how fast AI is progressing, the problem of ensuring AIs are robustly aligned with human values seems like the most important thing we can be working on now.

This talk is part of the Language Technology Lab Seminars series.

This talk is included in these lists:

  • This talk is not included in any other list

Note that ex-directory lists are not shown.

 

© 2006-2025 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity