BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//talks.cam.ac.uk//v3//EN
BEGIN:VTIMEZONE
TZID:Europe/London
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:19700329T010000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:19701025T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
CATEGORIES:NLIP Seminar Series
SUMMARY:Research Progress in Mechanistic Interpretability 
 - Arthur Conmy (Google DeepMind)
DTSTART;TZID=Europe/London:20250509T120000
DTEND;TZID=Europe/London:20250509T130000
UID:TALK229927AThttp://talks.cam.ac.uk
URL:http://talks.cam.ac.uk/talk/index/229927
DESCRIPTION:The goal of Mechanistic Interpretability research 
 is to explain how neural networks compute outputs 
 in terms of their internal components. But how muc
 h progress has been made towards this goal? While 
 a large amount of Mechanistic Interpretability res
 earch has been produced by academia\, frontier AI 
 companies such as Google DeepMind and independent 
 researchers in recent years\, there are still larg
 e open problems in the field. In this talk\, I wil
 l begin by discussing some background hypotheses a
 nd techniques in Mechanistic Interpretability\, su
 ch as the Linear Representation Hypothesis and com
 mon causal interventions. Then\, I’ll discuss how 
 this connects to research we’ve done at Google Dee
 pMind in the past year\, such as open sourcing Gem
 ma Scope\, the most comprehensive set of Sparse Au
 toencoders\, which took over 20% of the compute us
 ed to train GPT-3. Finally\, I’ll reflect on curre
 nt priorities and disagreements in Mechanistic Int
 erpretability\, several of which are built from Ge
 mma Scope. In short\, Mechanistic Interpretability
  is able to uncover factors influencing model beha
 vior that cannot naively be inferred from prompts 
 and outputs via circuits research\, but Mechanisti
 c Interpretability has thus far underperformed whe
 n benchmarked on well-defined real-world tasks (su
 ch as probing for harmful intent in user prompts).
 \n\nArthur Conmy is a Senior Research Engineer at 
 Google DeepMind who works on the Mechanistic Inter
 pretability team.\n
LOCATION:Room FW26 with Hybrid Format. Here is the Zoom lin
 k for those that wish to join online: https://cam-
 ac-uk.zoom.us/j/4751389294?pwd=Z2ZOSDk0eG1wZldVWG1
 GVVhrTzFIZz09
CONTACT:Suchir Salhan
END:VEVENT
END:VCALENDAR