Emergence of Linear Representations in LMs (NYU)
- 👤 Speaker: Dr. Shauli Ravfogel (NYU)
- 📅 Date & Time: Tuesday 28 October 2025, 11:00 - 12:00
- 📍 Venue: GR03, English Faculty Building, 9 West Road, Sidgwick Site and online https://cam-ac-uk.zoom.us/j/97599459216?pwd=QTRsOWZCOXRTREVnbTJBdXVpOXFvdz09
Abstract
Abstract: Recent work suggests that language models (LMs) encode many human-interpretable concepts as approximately linear directions in representation space. I first survey evidence for this “linear concept” hypothesis and show how it motivates steering methods—targeted interventions that causally modify model behavior. I then focus on truthfulness, demonstrating that LMs allocate a direction separating true from false assertions. Using an analytically tractable toy transformer, I present a plausible mechanism for how such linear structure emerges and how models exploit it to solve a factuality-related task. Taken together, these results bring us closer to understanding why “simple” geometry arises in LM representations.
Bio: Dr Shauli Ravfogel is a Postdoctoral Researcher and Faculty Fellow at the NYU Center of Data Science. He earned his PhD from the Natural Language Processing Lab at Bar-Ilan University, supervised by Prof. Yoav Goldberg. His research focuses on analyzing and controlling the internal representations of generative models, particularly language models. He studies how neural networks encode structured information, use it to solve tasks, and represent interpretable concepts. He aims—sometimes even successfully—to develop mathematically principled approaches to interpretability. He is particularly interested in understanding how simple structures, such as concept-aligned linear subspaces, emerge as a byproduct of the language modeling objective, and how such structures can be used to steer and control models. During his PhD, he worked on techniques to selectively control information in neural representations, with some fun linguistic side tours. More recently, he has explored framing language models as causal models and tackling questions of learnability in a controlled setting.
Series This talk is part of the Language Technology Lab Seminars series.
Included in Lists
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- GR03, English Faculty Building, 9 West Road, Sidgwick Site and online https://cam-ac-uk.zoom.us/j/97599459216?pwd=QTRsOWZCOXRTREVnbTJBdXVpOXFvdz09
- Guy Emerson's list
- Interested Talks
- Language Sciences for Graduate Students
- Language Technology Lab Seminars
- ndk22's list
- ob366-ai4er
- rp587
- Simon Baker's List
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Dr. Shauli Ravfogel (NYU)
Tuesday 28 October 2025, 11:00-12:00