Reward Modelling
Add to your list(s)
Download to your calendar using vCal
If you have a question about this talk, please contact Isaac Reid.
Zoom link available upon request (it is sent out on our mailing list, eng-mlg-rcc [at] lists.cam.ac.uk). Sign up to our mailing list for easier reminders via lists.cam.ac.uk.
Reward modelling broadly refers to the methods and practices for specifying the goals and objectives of a learning system and determining what constitutes a desirable outcome. Within reinforcement learning (RL), it refers to the process of designing and defining the rewards or reinforcement signals. In this talk, I will provide an overview of the popular methods for reward modelling, differentiating between implicit reward modelling methods such as imitation learning and cooperative inverse reinforcement learning, and explicit reward modelling methods such as inverse RL and RL from human feedback. I will further highlight various theoretical challenges in reward modelling, discuss use of reward modelling in language models such as GPT -4 and connections of reward modelling problem with AI alignment.
This talk is part of the Machine Learning Reading Group @ CUED series.
This talk is included in these lists:
Note that ex-directory lists are not shown.
|