Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

The unreasonable effectiveness of mathematics in large scale deep learning

Add to your list(s) Download to your calendar using vCal

Greg Yang, Microsoft Research
Wednesday 06 July 2022, 11:00-12:30
Cambridge University Engineering Department, CBL Seminar room BE4-38.

If you have a question about this talk, please contact James Allingham.

Recently, the theory of infinite-width neural networks led to the first technology, muTransfer, for tuning enormous neural networks that are too expensive to train more than once. For example, this allowed us to tune the 6.7 billion parameter version of GPT -3 using only 7% of its pretraining compute budget, and with some asterisks, we get a performance comparable to the original GPT -3 model with twice the parameter count. In this talk, I will explain the core insight behind this theory. In fact, this is an instance of what I call the Optimal Scaling Thesis, which connects infinite-size limits for general notions of “size” to the optimal design of large models in practice, illustrating a way for theory to reliably guide the future of AI. I’ll end with several concrete key mathematical research questions whose resolutions will have incredible impact on how practitioners scale up their NNs.

There’s no required reading for the talk but folks can look at my homepage for an overview of Tensor Programs.

This talk is part of the Machine Learning Reading Group @ CUED series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

The unreasonable effectiveness of mathematics in large scale deep learning

This talk is included in these lists:

Other lists

Other talks