University of Cambridge > Talks.cam > Machine Learning Reading Group @ CUED >  Implicit Regularization in Deep Learning

Implicit Regularization in Deep Learning

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Elre Oldewage.

Empirically, it has been observed that overparameterized neural networks trained by stochastic gradient descent (SGD) generalize well, even in absence of any explicit regularization. Because of overparameterization, there exist minima of the training loss which generalize poorly, but such bad minima are never encountered in practice. In recent years, a growing body of work suggests that the optimizer (SGD or similar) implicitly regularizes the training process and leads towards good minima that generalize well. In this presentation, we review three (non-exclusive) theories that aim at quantifying this effect: 1) Minibach noise in SGD avoids sharp minima that generalize poorly, 2) Gradient descent finds solutions with minimum norm, 3) SGD is equivalent to regularized gradient flow. These theories may improve our understanding of optimization and generalization in overparameterized models.

Readings:

https://arxiv.org/abs/1611.03530

https://arxiv.org/abs/1710.06451

https://arxiv.org/abs/2002.09277

https://arxiv.org/abs/1905.13655

https://arxiv.org/abs/2101.12176

Zoom link: https://eng-cam.zoom.us/j/82019956685?pwd=WUNSVVcrdC9IZGxQOHFhSThjUjd2dz09

This talk is part of the Machine Learning Reading Group @ CUED series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2022 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity