Talks.cam will close on 1 July 2026, further information is available on the UIS Help Site
 

University of Cambridge > Talks.cam > Language Technology Lab Seminars > Emergence of Linear Representations in LMs (NYU)

Emergence of Linear Representations in LMs (NYU)

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Shun Shao.

Abstract: Recent work suggests that language models (LMs) encode many human-interpretable concepts as approximately linear directions in representation space. I first survey evidence for this “linear concept” hypothesis and show how it motivates steering methods—targeted interventions that causally modify model behavior. I then focus on truthfulness, demonstrating that LMs allocate a direction separating true from false assertions. Using an analytically tractable toy transformer, I present a plausible mechanism for how such linear structure emerges and how models exploit it to solve a factuality-related task. Taken together, these results bring us closer to understanding why “simple” geometry arises in LM representations.

Bio: Dr Shauli Ravfogel is a Postdoctoral Researcher and Faculty Fellow at the NYU Center of Data Science. He earned his PhD from the Natural Language Processing Lab at Bar-Ilan University, supervised by Prof. Yoav Goldberg. His research focuses on analyzing and controlling the internal representations of generative models, particularly language models. He studies how neural networks encode structured information, use it to solve tasks, and represent interpretable concepts. He aims—sometimes even successfully—to develop mathematically principled approaches to interpretability. He is particularly interested in understanding how simple structures, such as concept-aligned linear subspaces, emerge as a byproduct of the language modeling objective, and how such structures can be used to steer and control models. During his PhD, he worked on techniques to selectively control information in neural representations, with some fun linguistic side tours. More recently, he has explored framing language models as causal models and tackling questions of learnability in a controlled setting.

This talk is part of the Language Technology Lab Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2026 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity