COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
Machine Learning is Linear AlgebraAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact . I will talk about how modelling assumptions manifest themselves as algebraic structure in a variety of settings, including optimization, attention, and network parameters, and how we can algorithmically exploit that structure for better scaling laws with transformers. As part of this effort, I will present a unifying framework that enables searching among all linear operators expressible via an Einstein summation. This framework encompasses previously proposed structures, such as low-rank, Kronecker, Tensor-Train, and Monarch, along with many novel structures. We develop a taxonomy of all such operators based on their computational and algebraic properties, which provides insights into their compute-optimal scaling laws. Combining these insights with empirical evaluation, we identify a subset of structures that achieve better performance than dense layers as a function of training compute, which we then develop into a high-performance sparse mixture-of-experts layer. This talk is part of the Cambridge Ellis Unit series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsThe Encyclopaedia of Literature in African Languages Logic and Semantics Seminar (Computer Laboratory) Cambridge Virology SeminarsOther talksHydrodynamic Hamiltonians of active two-dimensional fluids Agent-based modelling to inform the design of future transport systems Genetic adaptation to pathogens and the environment in humans and other primates The Price Is Wrong 21st century eugenics, scientific racism and the academic community: how science is manipulated to promote political ideology Do we really need more data? |