BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Mechanistic Interpretability - Progress and Limits - Arthur Conmy 
 (Google DeepMind)
DTSTART:20260303T160000Z
DTEND:20260303T170000Z
UID:TALK245191@talks.cam.ac.uk
CONTACT:Mateja Jamnik
DESCRIPTION:In the broadest sense\, mechanistic interpretability refers to
  explaining neural network behavior in terms of their internal components.
  We cover early work on vision models\, transformer circuits\, and automat
 ed circuit discovery. We then turn to superposition (what it means mathema
 tically and why we think it occurs in modern transformer language models)\
 , the linear representation hypothesis\, and sparse autoencoders. Finally\
 , we discuss recent applications in deployed AI systems\, and offer a bala
 nced perspective on when mechanistic interpretability is the right tool an
 d when other approaches may be more appropriate as future AI systems get m
 ore capable.\n\n*Bio:* Arthur Conmy is a Senior Research Engineer at Googl
 e DeepMind. He produced foundational mechanistic interpretability research
 \, including Interpretability in the Wild (ICLR) and ACDC: Automated Circu
 it Discovery (NeurIPS 2023)\, and recently added activation probes to live
  Gemini deployments to detect misuse.
LOCATION:Lecture Theatre 2\, Computer Laboratory\, William Gates Building
END:VEVENT
END:VCALENDAR
