University of Cambridge > Talks.cam > Language Technology Lab Seminars > Large language models for enabling constructive online conversations

Large language models for enabling constructive online conversations

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Panagiotis Fytas.

NLP systems promise to disrupt society through applications in high-stakes social domains. However, current evaluation and development focus on tasks that are not grounded in specific societal implications, which can lead to societal harm. There is a need to evaluate and mitigate the societal harms and, in doing so, bridge the gap between the realities of application and how models are currently developed. In this talk, I will present recent work addressing these issues in the domain of online content moderation. In the first part, I will discuss online content moderation to enable constructive conversations about race. Content moderation practices on social media risk silencing the voices of historically marginalized groups. We find that both the most recent models and humans disproportionately flag posts in which users share personal experiences of racism. Not only does this censorship hinder the potential of social media to give voice to marginalized communities, but we also find that witnessing such censorship exacerbates feelings of isolation. We offer a path to reduce censorship through a psychologically informed reframing of moderation guidelines. These findings reveal how automated content moderation practices can help or hinder this effort in an increasingly diverse nation where online interactions are commonplace. In the second part, I will discuss how identified biases in models can be traced to the use-mention distinction, which is the difference between the use of words to convey a speaker’s intent and mention of words for quoting what someone said or pointing out properties of a word. Computationally modeling the use-mention distinction is crucial for enabling counterspeech to hate and misinformation. Counterspeech that refutes problematic content mentions harmful language but is not harmful itself. We show that even recent language models fail at distinguishing use from mention and that this failure propagates to downstream tasks. We introduce prompting mitigations that teach the use-mention distinction and show that they reduce these errors. Finally, I will discuss the big picture and other recent efforts to address these issues in different domains beyond content moderation, including education, emotional support, and public discourse about AI. I will reflect on how, by doing so, we can minimize the harms and develop and apply NLP systems for social good.

This talk is part of the Language Technology Lab Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity