University of Cambridge > > ML@CL Seminar Series >  Randomized Automatic Differentiation

Randomized Automatic Differentiation

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact .

The successes of deep learning, variational inference, and many other fields have been aided by specialized implementations of reverse-mode automatic differentiation (AD) to compute gradients of mega-dimensional objectives. The AD techniques underlying these tools were designed to compute exact gradients to numerical precision, but modern machine learning models are almost always trained with stochastic gradient descent. Why spend computation and memory on exact (minibatch) gradients only to use them for stochastic optimization? In this talk, I give a quick overview of basic concepts in modern AD and talk about our work on Randomized Automatic Differentiation (RAD), which is a framework that allows unbiased gradient estimates to be computed with reduced memory in return for variance. In the work, we introduce a general approach for RAD , examine limitations of the general case, and develop specialized RAD strategies exploiting problem structure in case studies. We develop RAD techniques for a variety of simple neural network architectures, and show that for a fixed memory budget, RAD converges in fewer iterations than using a small batch size for feedforward networks, and in a similar number for recurrent networks. We also show that RAD can be applied to scientific computing, and use it to develop a low-memory stochastic gradient method for optimizing the control parameters of a linear reaction-diffusion PDE representing a fission reactor.

This talk is part of the ML@CL Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2021, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity