University of Cambridge > Talks.cam > Language Technology Lab Seminars > Human-Centered AI: Addressing the Ecological Fallacy in LLMs

Human-Centered AI: Addressing the Ecological Fallacy in LLMs

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Shun Shao.

Abstract: Today’s foundation models – whether they process sequences of words (NLP), matrices of pixels (vision), or timelines of audio spectra (speech) – treat each observation in isolation, a so-called ecological fallacy in disregarding the individuals and communities that generate data. In this talk, I argue for reconceptualizing the core probabilistic tasks of foundation models to integrate the people behind the data, for instance, by having LLMs estimate the probability of the next word not only from its preceding tokens but also from a higher-order representation of the data’s author. This “human language modeling” (HuLM) framework explicitly conditions on dynamic user states, drawing on theories of traits and states from psychology, to capture the structured dependencies among data and avoid the ecological fallacy. In a trade-off for modeling complexity, we will show these models can lead to improved performance on both traditional NLP tasks and health and psychological applications, more fundamentally aligning models of data with the realities of the human behavior that produced it.

Bio: H. Andrew Schwartz is the director of the Human Language Analysis Lab (HLAB) housed in the Computer Science Department at Stony Brook University (SUNY) and a PI/co-founder of the World Well-Being Project—a multidisciplinary consortium between the University of Pennsylvania, Stony Brook University, and Stanford University focused on developing large-scale language analyses that reveal differences in health, personality, and well-being. Andrew is an active contributor in the fields of AI-natural language processing, psychology, and health informatics, as well as a participant in tech for the public good, such as the UN Global Working Group on Big Data for Official Statistics. He was the 2020 recipient of a DARPA Young Faculty Award. Andrew is also the co-creator of the new R-Text package, which brings the language model technology behind ChatGPT to R, and the maintainer of the well-established Python package, Differential Language Analysis ToolKit (DLATK), used in over 100 studies and within tech. His research frequently attracts public interest, with coverage in publications such as The New York Times, USA Today, and The Washington Post.

This talk is part of the Language Technology Lab Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2025 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity