University of Cambridge > Talks.cam > Machine Learning Reading Group @ CUED > Circuits and Interpretability

Circuits and Interpretability

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Elre Oldewage.

In this talk we will look at methods that aim to make the internal computations of neural networks visible (‘interpretable’) to humans. This is useful for a) making deep learning models robust / fair / safe and b) in order to come to an empirical, scientific understanding of why deep learning works. We will cover various methods from the literature, and focus in particular on the study of circuits, i.e. modular subnetworks that serve a particular function.

Recommended reading:

The Building Blocks of Interpretability (https://distill.pub/2018/building-blocks/)

Optional reading:

  1. Adversarial Examples Are Not Bugs, They Are Features (https://arxiv.org/abs/1905.02175)
  2. Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks (https://arxiv.org/abs/2010.02066)

Our reading groups are live-streamed via Zoom and recorded for our Youtube channel. The Zoom details are distributed via our weekly mailing list.

This talk is part of the Machine Learning Reading Group @ CUED series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity