| COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. | ![]() |
University of Cambridge > Talks.cam > Isaac Newton Institute Seminar Series > Nash and Nemirovski walk into a bar: LLM alligment with Mirror Descent and Proximal Methods
Nash and Nemirovski walk into a bar: LLM alligment with Mirror Descent and Proximal MethodsAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact nobody. SCLW01 - Bridging Stochastic Control And Reinforcement Learning: Theories and Applications Traditional Reinforcement Learning from Human Feedback typically relies on reward models and preference structures such as the Bradley–Terry model. While effective in some cases, these assumptions fail to capture the richness of human preferences, which often exhibit phenomena such as intransitivity. In this talk, we present Nash Learning from Human Feedback, a more direct alternative that frames the problem as finding a Nash equilibrium in a game induced by human preferences. This perspective provides a principled way to model complex, potentially non-transitive preferences without the need to introduce a reward model. We will survey methods for approximating Nash equilibria in this setting, with a focus on fine-tuning large language models. In particular, we show how (approximate) proximal optimization methods—notably the NashMD and then Mirror Prox algorithm—can be adapted to achieve fast and stable convergence in this setting. Finally, we discuss practical strategies for efficiently implementing these approximate proximal methods in large-scale training. This talk is part of the Isaac Newton Institute Seminar Series series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsCambridge Global Food Security ML@CL Group Meetings CCIMI Short Course: Mathematics of Data - From Theory to ComputationOther talksTutorial: Generalization in Reinforcement Learning: From Foundations to New Frontiers JCTS talks Basic Principles: Audiences & Speaking and Presenting Drinks Reception Post-hoc tests, multiple comparisons, contrasts and handling interactions Five ideas by David Kohn, David Kohn Architects |