Talks.cam will close on 1 July 2026, further information is available on the UIS Help Site
 

University of Cambridge > Talks.cam > Isaac Newton Institute Seminar Series > Mean-field dynamics and training of deep transformers

Mean-field dynamics and training of deep transformers

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact nobody.

SCLW01 - Bridging Stochastic Control And Reinforcement Learning: Theories and Applications

In this talk, we will examine continuous limits of transformer architectures, which form the basis of common generative models. There is rich literature on the limiting behaviour of neural networks, including for large width of single layer neural networks (mean-field analysis) and large depth of residual neural networks (neural ODE and SDE analysis). Here, we consider limits of transformers with attention and scaling for a large number of layers, tokens, and attention heads. The analysis reveals that for plausible training outputs, a McKean—Vlasov limit with or without diffusive common noise results. Joint work with William Gibson.

This talk is part of the Isaac Newton Institute Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2025 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity