Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Machine Learning is Linear Algebra

Add to your list(s) Download to your calendar using vCal

Andrew Gordon Wilson - New York University
Thursday 13 February 2025, 16:00-17:00
https://cam-ac-uk.zoom.us/j/81897609356?pwd=HqbUQWnASjpBBZdaZo9r43M9Gj4N3Q.1.

If you have a question about this talk, please contact .

This talk has been canceled/deleted

I will talk about how modelling assumptions manifest themselves as algebraic structure in a variety of settings, including optimization, attention, and network parameters, and how we can algorithmically exploit that structure for better scaling laws with transformers. As part of this effort, I will present a unifying framework that enables searching among all linear operators expressible via an Einstein summation. This framework encompasses previously proposed structures, such as low-rank, Kronecker, Tensor-Train, and Monarch, along with many novel structures. We develop a taxonomy of all such operators based on their computational and algebraic properties, which provides insights into their compute-optimal scaling laws. Combining these insights with empirical evaluation, we identify a subset of structures that achieve better performance than dense layers as a function of training compute, which we then develop into a high-performance sparse mixture-of-experts layer.

This talk is part of the Machine Learning is Linear Algebra series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Machine Learning is Linear Algebra

This talk is included in these lists:

Other lists

Other talks