University of Cambridge > > Machine Learning Reading Group @ CUED > Natural gradient in deep neural networks

Natural gradient in deep neural networks

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Robert Pinsler.

We introduce the natural gradient method for stochastic optimization, and discuss whether and how this method could be applied to deep neural networks. We motivate the natural gradient by showing that the performance of stochastic gradient descent depends heavily on the choice of parameters, and it does not take into account the information geometry of the model. We show that this geometry is described by the Fisher information metric, and the steepest descent in the loss function is realized by the natural gradient, which is invariant to changes in parameters. We connect natural gradient with second-order optimization methods and discuss possible applications to deep neural networks. In particular, we present K-FAC, a specific method based on approximating the inverse Fisher information matrix as Kronecker-factorized blocks and independent layers. This allows connecting a variety of different methods under a unified framework (e.g. adaptive gradients, batch normalization, whitening). We describe applications of K-FAC to both standard and convolutional neural networks, and compare with state-of-art methods.

This talk is part of the Machine Learning Reading Group @ CUED series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2024, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity