Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Interpreting and Controlling Intermediate Representations in Large Language Models

Add to your list(s) Download to your calendar using vCal

Nicola Cancedda -Meta's Fundamental AI Research (FAIR) team.
Tuesday 26 November 2024, 14:00-15:00
Computer Lab, LT1.

If you have a question about this talk, please contact Sally Matthews.

Large Language Models (LLMs) reshaped the AI landscape and invited themselves as dinner-table topics, yet we do not really understand how they work. Propelled by this consideration, the field of AI interpretability is enjoying a revival. In this talk I will introduce some fundamental interpretability concepts and discuss how insights from studying the internal activations of models led us to develop a training framework that significantly increases the robustness of LLMs to ‘jailbreaking’ attacks. I will also illustrate some explorations of the internal workings of transformer-based autoregressive LLMs that unexpectedly led to explaining ‘attention sinking’, a necessary mechanism for their proper functioning. I will finally offer my perspective on interesting future directions.

Nicola Cancedda is a researcher with Meta’s Fundamental AI Research (FAIR) team. His current focus is on better understanding how Large Language Models realize complex behaviors to make them more capable, safer, and more efficient. He is an alumnus of the University of Rome “La Sapienza”, and has held applied and fundamental research and management positions at Meta, Xerox, and Microsoft, pushing the state of the art in Machine Learning, Machine Translation, and Natural Language Processing, and leading the transfer of research results to large-scale production environment.

This talk is part of the Cambridge ML Systems Seminar Series series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Interpreting and Controlling Intermediate Representations in Large Language Models

This talk is included in these lists:

Other lists

Other talks