Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Human-Centered AI: Addressing the Ecological Fallacy in LLMs

Add to your list(s) Download to your calendar using vCal

Prof, H Andrew Schwartz (State University of New York at Stony Brook)
Thursday 08 May 2025, 15:00-16:00
https://cam-ac-uk.zoom.us/j/97599459216?pwd=QTRsOWZCOXRTREVnbTJBdXVpOXFvdz09.

If you have a question about this talk, please contact Shun Shao.

Abstract: Today’s foundation models – whether they process sequences of words (NLP), matrices of pixels (vision), or timelines of audio spectra (speech) – treat each observation in isolation, a so-called ecological fallacy in disregarding the individuals and communities that generate data. In this talk, I argue for reconceptualizing the core probabilistic tasks of foundation models to integrate the people behind the data, for instance, by having LLMs estimate the probability of the next word not only from its preceding tokens but also from a higher-order representation of the data’s author. This “human language modeling” (HuLM) framework explicitly conditions on dynamic user states, drawing on theories of traits and states from psychology, to capture the structured dependencies among data and avoid the ecological fallacy. In a trade-off for modeling complexity, we will show these models can lead to improved performance on both traditional NLP tasks and health and psychological applications, more fundamentally aligning models of data with the realities of the human behavior that produced it.

Bio: H. Andrew Schwartz is the director of the Human Language Analysis Lab (HLAB) housed in the Computer Science Department at Stony Brook University (SUNY) and a PI/co-founder of the World Well-Being Project—a multidisciplinary consortium between the University of Pennsylvania, Stony Brook University, and Stanford University focused on developing large-scale language analyses that reveal differences in health, personality, and well-being. Andrew is an active contributor in the fields of AI-natural language processing, psychology, and health informatics, as well as a participant in tech for the public good, such as the UN Global Working Group on Big Data for Official Statistics. He was the 2020 recipient of a DARPA Young Faculty Award. Andrew is also the co-creator of the new R-Text package, which brings the language model technology behind ChatGPT to R, and the maintainer of the well-established Python package, Differential Language Analysis ToolKit (DLATK), used in over 100 studies and within tech. His research frequently attracts public interest, with coverage in publications such as The New York Times, USA Today, and The Washington Post.

This talk is part of the Language Technology Lab Seminars series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Human-Centered AI: Addressing the Ecological Fallacy in LLMs

This talk is included in these lists:

Other lists

Other talks