University of Cambridge > Talks.cam > NLIP Seminar Series > Metrized Deep Learning: Fast & Scalable Training

Metrized Deep Learning: Fast & Scalable Training

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Suchir Salhan.

We build neural networks in a modular and programmatic way using software libraries like PyTorch and JAX . But optimization theory has not caught up to the flexibility of this paradigm, and practical advances in neural net optimization are largely heuristics driven. In this talk we argue that, if we are to treat deep learning rigorously, then we must build our optimization theory programmatically and in lockstep with the neural network itself. To instantiate this idea, we propose the “modular norm”, which is a norm on the weight space of general neural architectures. The modular norm is constructed by stitching together norms on individual tensor spaces as the architecture is constructed. The modular norm has several applications: automatic Lipschitz certificates for general architectures in both weights and inputs; automatic learning rate transfer across scale; more recently, we built the “duality theory” for the modular norm, leading to dualized optimizers like Muon, which have set speed records for training transformers. We are building the theory of the modular norm into a software library called Modula to ease the development and deployment of rigorous deep learning algorithms—-you can find out more at https://modula.systems/.

This talk is part of the NLIP Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2025 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity