BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Extrapolating model performance across training horizons - Kris Ca
 o (Cohere)
DTSTART:20251114T120000Z
DTEND:20251114T130000Z
UID:TALK235297@talks.cam.ac.uk
CONTACT:Suchir Salhan
DESCRIPTION:**Abstract:** Modern large-scale language model training runs 
 often use several orders-of-magnitude more tokens in the final training ru
 n than in the experiments in the leadup to the run. How can we confidently
  extrapolate the results of these small-scale experiments to make predicti
 ons of the likely final outcome of our large-scale run? \n\nIn this talk I
  will present two methods I've worked on to solve this problem. The first 
 assumes that model performance on binary-outcome tasks can be modelled as 
 a sigmoid regression based on the model loss on some validation data. Long
 -horizon performance can then be read off the parameters of this regressio
 n. This method also lets us verify the validation sets we use to measure m
 odel progress based on how accurately they predict long-horizon outcomes. 
 The second aims to solve the mismatch between infinite- and finite-data re
 gimes by artificially inducing finite-data effects at small token horizon 
 by subsampling training data. We show that such 'repeat-aware' experiments
  help us more accurately determine the optimal data mixture for long-horiz
 on experiments based on short-horizon runs.\n\n**Speaker Biography:** Kris
  Cao is a member of the Technical staff at Cohere\, working on model pretr
 aining\, tokenization\, trustworthy evals\, signal at small scales\, model
  optimization\, and data infrastructure. Kris previously completed his und
 ergraduate and postgraduate studies at the University of Cambridge\, befor
 e taking up a position as a researcher at Google DeepMind. Kris completed 
 his PhD in the Natural Language & Information Processing (NLIP) Group\, wi
 th a thesis on "Learning meaning representations for text generation with 
 deep generative models"\, supervised by Dr Stephen Clark. 
LOCATION:FW26 Hybrid (In-Person + Online). Here is the Google Meet Link: h
 ttps://meet.google.com/yeu-pqce-rsn
END:VEVENT
END:VCALENDAR
