BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//talks.cam.ac.uk//v3//EN
BEGIN:VTIMEZONE
TZID:Europe/London
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:19700329T010000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:19701025T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
CATEGORIES:NLIP Seminar Series
SUMMARY:Exploring and Controlling Social Values in Large L
 anguage Models through Role-Playing  - Paul Röttge
 r (Oxford University)
DTSTART;TZID=Europe/London:20230120T120000
DTEND;TZID=Europe/London:20230120T130000
UID:TALK195865AThttp://talks.cam.ac.uk
URL:http://talks.cam.ac.uk/talk/index/195865
DESCRIPTION:Abstract: \n\nSocial values are a key factor in hu
 man decision-making. Some people\, for example\, o
 ppose the death penalty while others support it\, 
 and there is no single objective truth. Large lang
 uage models are pre-trained on texts authored by m
 any different people with different social values.
  But when prompted to answer an ethical question o
 r complete a subjective task\, model responses wil
 l necessarily align with some social values\, and 
 not others. This leads to two questions that I wan
 t to answer in my research: 1) What social values 
 are reflected in model behaviour? 2) How can we co
 ntrol these values\, and by extension model behavi
 our? In my talk\, I will introduce role-playing as
  a framework for exploring these questions\, diffe
 rentiating between generic roles that models play 
 by default\, and specific roles that we ask them t
 o play\, for example based on sociodemographic att
 ributes. I will discuss requirements for successfu
 l role-playing\, including role stability\, intern
 al and external alignment\, as well as the limitat
 ions of role-playing. Lastly\, I will present init
 ial role-playing experiments for hate speech detec
 tion\, as a highly subjective task.\n\nBio: \n\nPa
 ul Röttger is a final-year DPhil student at the Un
 iversity of Oxford\, working on natural language p
 rocessing. In his thesis\, he focused on evaluatin
 g and improving hate speech detection models\, ada
 pting language models to language change\, and man
 aging subjectivity in data annotation. His main re
 search interest now is in exploring and controllin
 g the behaviour of large language models in relati
 on to social values\, as part of a larger goal to 
 make models more helpful and less harmful.
LOCATION:Computer Lab\, SS03
CONTACT:Michael Schlichtkrull
END:VEVENT
END:VCALENDAR
