Talks.cam will close on 1 July 2026, further information is available on the UIS Help Site
 

University of Cambridge > Talks.cam > Language Technology Lab Seminars > Explanations as a Catalyst: Leveraging Large Language Models to Embrace Human Label Variation

Explanations as a Catalyst: Leveraging Large Language Models to Embrace Human Label Variation

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Shun Shao.

Abstract:

Human label variation (HLV)—the phenomenon where multiple annotators provide different yet valid labels for the same data—is a rich source of information often dismissed as noise. Capturing this variation is crucial for building robust NLP systems, but doing so is typically resource-intensive. This talk presents a series of studies on how Large Language models (LLMs) can serve as a catalyst to embrace and model HLV , moving from scalable approximation to a deeper analysis of the reasoning process itself.

First, I will discuss how LLMs can approximate full Human Judgment Distributions (HJDs) from just a few human-provided explanations. Our work shows that this explanation-based approach significantly improves alignment with human judgments. This investigation also reveals the limitations of traditional, instance-level distribution metrics and highlights the importance of complementing them with global-level measures to more effectively evaluate alignment.

Building on this, the second part of the talk addresses the high cost of collecting human explanations by asking: can LLM -generated explanations serve as a viable proxy? We demonstrate that when guided by a few human labels, explanations generated by LLMs are indeed effective proxies, achieving comparable performance to human-written ones in approximating HJDs. This finding opens up a scalable and efficient pathway for modeling HLV , especially for datasets where human explanations are not available.

Finally, I will shift from post-hoc explanation (justifying a given answer) to a forward-reasoning paradigm. I will introduce CoT2EL, a novel pipeline that extracts explanation-label pairs directly from an LLM ’s Chain-of-Thought (CoT) process before a final answer is selected. This method allows us to analyze the model’s reasoning across multiple plausible options. To better assess these nuanced judgments, I will also present a new rank-based evaluation framework that prioritizes the ordering of answers over exact distributional scores, showing a stronger alignment with human decision-making.

Bio

Beiduo Chen is a PhD student at the MaiNLP lab at LMU Munich, supervised by Prof. Barbara Plank. He is also a member of the European Laboratory for Learning and Intelligent Systems (ELLIS) PhD Program, co-supervised by Prof. Anna Korhonen at University of Cambridge. He received his Master’s and Bachelor’s degrees from the University of Science and Technology of China. His research focuses on human-centered NLP , with a special emphasis on the uncertainty, trustworthiness, and evaluation of Large Language Models. He has published several papers in top-tier NLP conferences, including ACL and EMNLP .

This talk is part of the Language Technology Lab Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2025 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity