Compositional mathematics and automatic gradient descent
Add to your list(s)
Download to your calendar using vCal
If you have a question about this talk, please contact Dr R.E. Turner.
At its heart, deep learning involves composing operators and iteratively perturbing their weights. But we lack the mathematical tools needed to understand how compound operators behave under perturbation. As a consequence, our state-of-the-art training algorithms can be brittle and require manual tuning to work well. In this talk, we propose a new suite of mathematical tools for dealing with compound operators. This includes a new chain rule for understanding how “smoothness” or “linearisation error” behaves under composition, and also perturbation bounds for compound operators. We assemble these tools and apply the majorise-minimise principle to derive “automatic gradient descent”. AGD is a hyperparameter-free training algorithm for deep neural networks that has been validated at ImageNet scale and has the potential to further automate machine learning workflows.
This talk is part of the Machine Learning @ CUED series.
This talk is included in these lists:
Note that ex-directory lists are not shown.
|