COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > Cambridge ML Systems Seminar Series > Interpreting and Controlling Intermediate Representations in Large Language Models
Interpreting and Controlling Intermediate Representations in Large Language ModelsAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Sally Matthews. Large Language Models (LLMs) reshaped the AI landscape and invited themselves as dinner-table topics, yet we do not really understand how they work. Propelled by this consideration, the field of AI interpretability is enjoying a revival. In this talk I will introduce some fundamental interpretability concepts and discuss how insights from studying the internal activations of models led us to develop a training framework that significantly increases the robustness of LLMs to ‘jailbreaking’ attacks. I will also illustrate some explorations of the internal workings of transformer-based autoregressive LLMs that unexpectedly led to explaining ‘attention sinking’, a necessary mechanism for their proper functioning. I will finally offer my perspective on interesting future directions. Nicola Cancedda is a researcher with Meta’s Fundamental AI Research (FAIR) team. His current focus is on better understanding how Large Language Models realize complex behaviors to make them more capable, safer, and more efficient. He is an alumnus of the University of Rome “La Sapienza”, and has held applied and fundamental research and management positions at Meta, Xerox, and Microsoft, pushing the state of the art in Machine Learning, Machine Translation, and Natural Language Processing, and leading the transfer of research results to large-scale production environment. This talk is part of the Cambridge ML Systems Seminar Series series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsWaves Group (DAMTP) Darwin Humanities and Social Sciences Seminar Hopkinson SeminarsOther talks2024 Max Perutz Lecture: Antisense Modulation of RNA Splicing for Rare Disease Therapy - In Person Only A domino theory of disease The Environments of Type Ia Supernovae Helsing: Title to be confrimed THIS Space 2024 Lewis Lectures 2025 - Lecture I |