![]() |
University of Cambridge > Talks.cam > NLIP Seminar Series > Extrapolating model performance across training horizons
Extrapolating model performance across training horizonsAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Suchir Salhan. Abstract: Modern large-scale language model training runs often use several orders-of-magnitude more tokens in the final training run than in the experiments in the leadup to the run. How can we confidently extrapolate the results of these small-scale experiments to make predictions of the likely final outcome of our large-scale run? In this talk I will present two methods I’ve worked on to solve this problem. The first assumes that model performance on binary-outcome tasks can be modelled as a sigmoid regression based on the model loss on some validation data. Long-horizon performance can then be read off the parameters of this regression. This method also lets us verify the validation sets we use to measure model progress based on how accurately they predict long-horizon outcomes. The second aims to solve the mismatch between infinite- and finite-data regimes by artificially inducing finite-data effects at small token horizon by subsampling training data. We show that such ‘repeat-aware’ experiments help us more accurately determine the optimal data mixture for long-horizon experiments based on short-horizon runs. Speaker Biography: Kris Cao is a member of the Technical staff at Cohere, working on model pretraining, tokenization, trustworthy evals, signal at small scales, model optimization, and data infrastructure. Kris previously completed his undergraduate and postgraduate studies at the University of Cambridge, before taking up a position as a researcher at Google DeepMind. Kris completed his PhD in the Natural Language & Information Processing (NLIP) Group, with a thesis on “Learning meaning representations for text generation with deep generative models”, supervised by Dr Stephen Clark. This talk is part of the NLIP Seminar Series series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsBusiness Briefings: International Seminar Series 2015-16 SPI CUED Control Group SeminarsOther talksExternal Seminar - Jenn Brophy TBC Break Gong Show Modeling and simulation of salt caverns: from lab to field scale Director's Briefing Get started! & Writing about anything for anyone |