Gradient-based Hyperparameter Optimisation
Add to your list(s)
Download to your calendar using vCal
If you have a question about this talk, please contact Elre Oldewage.
We know the ultimate performance of our machine learning systems depends crucially on our training hyperparameters, motivating work to automate their selection. But traditional methods require many repeated runs and scale poorly to large numbers of hyperparameters (including variable schedules and per-parameter optimiser settings). New gradient-based methods aim to address these issues by updating hyperparameters during training itself, providing a more direct update signal than the black-box models used previously. In this talk, we explore the evolution of these methods and some recent developments, culminating in algorithms which can feasibly optimise millions of hyperparameters in parallel with network weights.
Optional Reading:
Our exposition will not assume any pre-reading. However, the following recent paper closely matches our notation and draws on much of the relevant literature, so would provide useful familiarisation for anybody who wishes:
Jonathan Lorraine, Paul Vicol, David Duvenaud,
Optimizing Millions of Hyperparameters by Implicit Differentiation,
AISTATS 2020
http://proceedings.mlr.press/v108/lorraine20a.html (avoid the out-of-date ArXiv version)
This talk is part of the Machine Learning Reading Group @ CUED series.
This talk is included in these lists:
Note that ex-directory lists are not shown.
|