![]() |
University of Cambridge > Talks.cam > Language Technology Lab Seminars > Emergence of Linear Representations in LMs (NYU)
Emergence of Linear Representations in LMs (NYU)Add to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Shun Shao. Abstract: Recent work suggests that language models (LMs) encode many human-interpretable concepts as approximately linear directions in representation space. I first survey evidence for this “linear concept” hypothesis and show how it motivates steering methods—targeted interventions that causally modify model behavior. I then focus on truthfulness, demonstrating that LMs allocate a direction separating true from false assertions. Using an analytically tractable toy transformer, I present a plausible mechanism for how such linear structure emerges and how models exploit it to solve a factuality-related task. Taken together, these results bring us closer to understanding why “simple” geometry arises in LM representations. Bio: Dr Shauli Ravfogel is a Postdoctoral Researcher and Faculty Fellow at the NYU Center of Data Science. He earned his PhD from the Natural Language Processing Lab at Bar-Ilan University, supervised by Prof. Yoav Goldberg. His research focuses on analyzing and controlling the internal representations of generative models, particularly language models. He studies how neural networks encode structured information, use it to solve tasks, and represent interpretable concepts. He aims—sometimes even successfully—to develop mathematically principled approaches to interpretability. He is particularly interested in understanding how simple structures, such as concept-aligned linear subspaces, emerge as a byproduct of the language modeling objective, and how such structures can be used to steer and control models. During his PhD, he worked on techniques to selectively control information in neural representations, with some fun linguistic side tours. More recently, he has explored framing language models as causal models and tackling questions of learnability in a controlled setting. This talk is part of the Language Technology Lab Seminars series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsEPRG Energy and Environment Seminars Michaelmas 2009 DELPH-IN Summit Open Session Top Mobile app developmentOther talksAfternoon Break Respiratory Medicine and Metabolic Medicine Computational Biology: Seminar Series - Dr Roser Vento-Torno Cambridge RNA Club - IN PERSON Neuroscience CPC Uncovering Genomic Drivers Across 13 Feline Cancer Types |