University of Cambridge > Talks.cam > NLIP Seminar Series > Robust Alignment of Large Language Models

Robust Alignment of Large Language Models

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Suchir Salhan.

The alignment of large language models (LLMs) can often be brittle when faced with the complexities of real-world deployment. In this talk, I share our investigations on two scenarios where special care is required to ensure robust alignment.

The first scenario is multi-objective alignment, where balancing competing objectives is particularly challenging. Our recent work, Robust Multi-Objective Decoding (RMOD), an inference-time alignment algorithm, adaptively adjusts the weights of different objectives during response generation to ensure none are neglected. RMOD provides principled robustness with minimal overhead, consistently outperforming existing methods across several alignment benchmarks.

In the second part of the talk, I will address preference model misspecification in self-play alignment. While self-play is a promising alignment approach, naive implementations are vulnerable to inaccuracies in the preference model. To address this, our Regularized Self-Play Policy Optimization (RSPO) framework offers a versatile and modular method for regularizing the self-play alignment process. RSPO ’s ability to combine various regularizers results in strong performance gains on multiple evaluation sets, such as AlpacaEval-2 and Arena-Hard.

As a bonus, I will briefly introduce our recent investigation into the robustness of Mixture-of-Agent (MoA) systems, a popular multi-agent paradigm. We show that even a single malicious agent introduced into the mixture can nullify the benefits of the entire system.

This talk is part of the NLIP Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2025 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity