Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Circuits and Interpretability

Add to your list(s) Download to your calendar using vCal

Lauro Langosco, Elre Oldewage and Juyeon Heo(University of Cambridge)
Wednesday 16 February 2022, 11:00-12:30
Cambridge University Engineering Department ,LR3A.

If you have a question about this talk, please contact Elre Oldewage.

In this talk we will look at methods that aim to make the internal computations of neural networks visible (‘interpretable’) to humans. This is useful for a) making deep learning models robust / fair / safe and b) in order to come to an empirical, scientific understanding of why deep learning works. We will cover various methods from the literature, and focus in particular on the study of circuits, i.e. modular subnetworks that serve a particular function.

Recommended reading:

The Building Blocks of Interpretability (https://distill.pub/2018/building-blocks/)

Optional reading:

Adversarial Examples Are Not Bugs, They Are Features (https://arxiv.org/abs/1905.02175)
Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks (https://arxiv.org/abs/2010.02066)

Our reading groups are live-streamed via Zoom and recorded for our Youtube channel. The Zoom details are distributed via our weekly mailing list.

This talk is part of the Machine Learning Reading Group @ CUED series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Circuits and Interpretability

This talk is included in these lists:

Other lists

Other talks