Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Metrized Deep Learning: Fast & Scalable Training

Add to your list(s) Download to your calendar using vCal

Jeremy Bernstein (MIT)
Friday 14 February 2025, 12:00-13:00
ONLINE ONLY. Here is the Zoom link: https://cam-ac-uk.zoom.us/j/4751389294?pwd=Z2ZOSDk0eG1wZldVWG1GVVhrTzFIZz09.

If you have a question about this talk, please contact Suchir Salhan.

We build neural networks in a modular and programmatic way using software libraries like PyTorch and JAX . But optimization theory has not caught up to the flexibility of this paradigm, and practical advances in neural net optimization are largely heuristics driven. In this talk we argue that, if we are to treat deep learning rigorously, then we must build our optimization theory programmatically and in lockstep with the neural network itself. To instantiate this idea, we propose the “modular norm”, which is a norm on the weight space of general neural architectures. The modular norm is constructed by stitching together norms on individual tensor spaces as the architecture is constructed. The modular norm has several applications: automatic Lipschitz certificates for general architectures in both weights and inputs; automatic learning rate transfer across scale; more recently, we built the “duality theory” for the modular norm, leading to dualized optimizers like Muon, which have set speed records for training transformers. We are building the theory of the modular norm into a software library called Modula to ease the development and deployment of rigorous deep learning algorithms—-you can find out more at https://modula.systems/.

This talk is part of the NLIP Seminar Series series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Metrized Deep Learning: Fast & Scalable Training

This talk is included in these lists:

Other lists

Other talks