COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > Language Technology Lab Seminars > Pluralistic Alignment through Personalized Reinforcement Learning from Human Feedback
Pluralistic Alignment through Personalized Reinforcement Learning from Human FeedbackAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Tiancheng Hu. Abstract: Reinforcement Learning from Human Feedback (RLHF) is a powerful paradigm for aligning foundation models to human values and preferences. However, current RLHF techniques cannot account for the naturally occurring differences in individual human preferences across a diverse population. When these differences arise, traditional RLHF frameworks simply average over them, leading to inaccurate rewards and poor performance for minority groups. To address the need for pluralistic alignment, we develop a novel multimodal RLHF method, which we term Variational Preference Learning (VPL). In this talk, I will first give an overview of past approaches to RLHF , and then show how VPL address issues of value monism. VPL uses a few preference labels to infer a novel user-specific latent variable, and learns reward models and policies conditioned on this latent without additional user-specific data. While conceptually simple, we show that in practice, this reward modeling requires careful algorithmic considerations around model architecture and reward scaling. To empirically validate our proposed technique, we first show that it can provide a way to combat underspecification in simulated control problems, inferring and optimizing user-specific reward functions. Next, we conduct experiments on pluralistic language datasets representing diverse user preferences and demonstrate improved reward function accuracy. We additionally show the benefits of this probabilistic framework in terms of measuring uncertainty, and actively learning user preferences. This work enables learning from diverse populations of users with divergent preferences, an important challenge that naturally occurs in problems from robot learning to foundation model alignment. Bio: Natasha Jaques is an Assistant Professor of Computer Science and Engineering at the University of Washington, and a Senior Research Scientist at Google DeepMind. Her research focuses on Social Reinforcement Learning in multi-agent and human-AI interactions. During her PhD at MIT , she developed techniques for learning from human feedback signals to train language models which were later built on by OpenAI’s series of work on Reinforcement Learning from Human Feedback (RLHF). In the multi-agent space, she has developed techniques for improving coordination through the optimization of social influence, and adversarial environment generation for improving the robustness of RL agents. Natasha’s work has received various awards, including Best Demo at NeurIPS, an honourable mention for Best Paper at ICML , and the Outstanding PhD Dissertation Award from the Association for the Advancement of Affective Computing. Her work has been featured in Science Magazine, MIT Technology Review, Quartz, IEEE Spectrum, Boston Magazine, and on CBC radio, among others. Natasha earned her Masters degree from the University of British Columbia, and undergraduate degrees in Computer Science and Psychology from the University of Regina. This talk is part of the Language Technology Lab Seminars series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsCambridge University Geographical Society Cambridge Geotechnical Society Seminar Series One O'Clock Research Spotlights (Cambridge Migration Research Network - CAMMIGRES)Other talksNon-apical mitoses contribute to cell delamination during mouse gastrulation Czech Collections Micro-architectural modelling and verification of an x86 micro-processor Early Cancer Institute Seminar: Dr Calum Gabbutt, Imperial College London & the Institute of Cancer Research Director's Briefing and Organiser's Welcome OptimUS: an open-source Python library for 3D acoustic wave propagation |