![]() |
COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. | ![]() |
University of Cambridge > Talks.cam > Language Technology Lab Seminars > Human-Centered AI: Addressing the Ecological Fallacy in LLMs
![]() Human-Centered AI: Addressing the Ecological Fallacy in LLMsAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Shun Shao. Abstract: Today’s foundation models – whether they process sequences of words (NLP), matrices of pixels (vision), or timelines of audio spectra (speech) – treat each observation in isolation, a so-called ecological fallacy in disregarding the individuals and communities that generate data. In this talk, I argue for reconceptualizing the core probabilistic tasks of foundation models to integrate the people behind the data, for instance, by having LLMs estimate the probability of the next word not only from its preceding tokens but also from a higher-order representation of the data’s author. This “human language modeling” (HuLM) framework explicitly conditions on dynamic user states, drawing on theories of traits and states from psychology, to capture the structured dependencies among data and avoid the ecological fallacy. In a trade-off for modeling complexity, we will show these models can lead to improved performance on both traditional NLP tasks and health and psychological applications, more fundamentally aligning models of data with the realities of the human behavior that produced it. Bio: H. Andrew Schwartz is the director of the Human Language Analysis Lab (HLAB) housed in the Computer Science Department at Stony Brook University (SUNY) and a PI/co-founder of the World Well-Being Project—a multidisciplinary consortium between the University of Pennsylvania, Stony Brook University, and Stanford University focused on developing large-scale language analyses that reveal differences in health, personality, and well-being. Andrew is an active contributor in the fields of AI-natural language processing, psychology, and health informatics, as well as a participant in tech for the public good, such as the UN Global Working Group on Big Data for Official Statistics. He was the 2020 recipient of a DARPA Young Faculty Award. Andrew is also the co-creator of the new R-Text package, which brings the language model technology behind ChatGPT to R, and the maintainer of the well-established Python package, Differential Language Analysis ToolKit (DLATK), used in over 100 studies and within tech. His research frequently attracts public interest, with coverage in publications such as The New York Times, USA Today, and The Washington Post. This talk is part of the Language Technology Lab Seminars series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsMathematics and Computation Levelling the playing field: dynamics or excuses for not moving beyond the status quo Quantum Fields and Strings SeminarsOther talksCellular Responses to Mitochondrial Dysfunction What role is there, if any, for human-oriented automatic theorem proving today? Fortitude: A Modern Fortran Linter Weighted evaluation of probabilistic forecasts Prime spectra of bi-incomplete Tambara functors Pianos, guitars and double decays |