University of Cambridge > > Machine Learning @ CUED > Interpretability - the myth, questions, and some answers

Interpretability - the myth, questions, and some answers

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Adrian Weller.


In this talk, I will provide an overview of my work on interpretability from the past couple of years. I will talk about 1) our studies on factors that influence how humans understand explanations from machine learning models, 2) building inherently interpretable models with and without human-in-the-loop, 3) improving interpretability when you already have a model (post-training interpretability) and 4) our work on ways to test and evaluate interpretability methods.

Among them, I will take a deeper dive in one of my recent works – testing with concept activation vectors (TCAV) – a post-training interpretability method for complex models, such as neural networks. This method provides an interpretation of a neural net’s internal state in terms of human-friendly, high-level concepts instead of low-level input features. The key idea is to view the high-dimensional internal state of a neural net as an aid, not an obstacle. We show how to use concept activation vectors (CAVs) as part of a technique, Testing with CAVs (TCAV), that uses directional derivatives to quantify the degree to which a user-defined concept is important to a classification result—for example, how sensitive a prediction of “zebra” is to the presence of stripes. Using the domain of image classification as a testing ground, we describe how CAVs may be used to explore hypotheses and generate insights for a standard image classification network as well as a medical application.

This talk is part of the Machine Learning @ CUED series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2024, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity