This version of Talks.cam will be replaced by 1 July 2026, further information is available on the UIS Help Site
 

University of Cambridge > Talks.cam > NLIP Seminar Series > Extrapolating model performance across training horizons

Extrapolating model performance across training horizons

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Suchir Salhan.

Abstract: Modern large-scale language model training runs often use several orders-of-magnitude more tokens in the final training run than in the experiments in the leadup to the run. How can we confidently extrapolate the results of these small-scale experiments to make predictions of the likely final outcome of our large-scale run?

In this talk I will present two methods I’ve worked on to solve this problem. The first assumes that model performance on binary-outcome tasks can be modelled as a sigmoid regression based on the model loss on some validation data. Long-horizon performance can then be read off the parameters of this regression. This method also lets us verify the validation sets we use to measure model progress based on how accurately they predict long-horizon outcomes. The second aims to solve the mismatch between infinite- and finite-data regimes by artificially inducing finite-data effects at small token horizon by subsampling training data. We show that such ‘repeat-aware’ experiments help us more accurately determine the optimal data mixture for long-horizon experiments based on short-horizon runs.

Speaker Biography: Kris Cao is a member of the Technical staff at Cohere, working on model pretraining, tokenization, trustworthy evals, signal at small scales, model optimization, and data infrastructure. Kris previously completed his undergraduate and postgraduate studies at the University of Cambridge, before taking up a position as a researcher at Google DeepMind. Kris completed his PhD in the Natural Language & Information Processing (NLIP) Group, with a thesis on “Learning meaning representations for text generation with deep generative models”, supervised by Dr Stephen Clark.

This talk is part of the NLIP Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2026 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity