If you have a question about this talk, please contact Elre Oldewage.
Empirically, it has been observed that overparameterized neural networks trained by stochastic gradient descent (SGD) generalize well, even in absence of any explicit regularization. Because of overparameterization, there exist minima of the training loss which generalize poorly, but such bad minima are never encountered in practice. In recent years, a growing body of work suggests that the optimizer (SGD or similar) implicitly regularizes the training process and leads towards good minima that generalize well. In this presentation, we review three (non-exclusive) theories that aim at quantifying this effect: 1) Minibach noise in SGD avoids sharp minima that generalize poorly, 2) Gradient descent finds solutions with minimum norm, 3) SGD is equivalent to regularized gradient flow. These theories may improve our understanding of optimization and generalization in overparameterized models.