Natural gradient in deep neural networks
Add to your list(s)
Download to your calendar using vCal
If you have a question about this talk, please contact Robert Pinsler.
We introduce the natural gradient method for stochastic optimization, and discuss whether and how this method could be applied to deep neural networks. We motivate the natural gradient by showing that the performance of stochastic gradient descent depends heavily on the choice of parameters, and it does not take into account the information geometry of the model. We show that this geometry is described by the Fisher information metric, and the steepest descent in the loss function is realized by the natural gradient, which is invariant to changes in parameters. We connect natural gradient with second-order optimization methods and discuss possible applications to deep neural networks. In particular, we present K-FAC, a specific method based on approximating the inverse Fisher information matrix as Kronecker-factorized blocks and independent layers. This allows connecting a variety of different methods under a unified framework (e.g. adaptive gradients, batch normalization, whitening). We describe applications of K-FAC to both standard and convolutional neural networks, and compare with state-of-art methods.
This talk is part of the Machine Learning Reading Group @ CUED series.
This talk is included in these lists:
Note that ex-directory lists are not shown.
|