Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Distributed stochastic optimization for deep learning

Add to your list(s) Download to your calendar using vCal

Sixin Zhang (NYU)
Tuesday 14 June 2016, 10:00-11:00
Engineering Department, CBL Room BE-438.

If you have a question about this talk, please contact Louise Segar.

Via Skype

We study the problem of how to distribute the training of large-scale deep learning models in the parallel computing environment. We propose a new distributed stochastic optimization method called Elastic Averaging SGD (EASGD). We analyze the convergence rate of the EASGD method in the synchronous scenario and compare its stability condition with the existing ADMM method in the round-robin scheme. An asynchronous and momentum variant of the EASGD method is applied to train deep convolutional neural networks for image classification on the CIFAR and ImageNet datasets. Our approach accelerates the training and furthermore achieves better test accuracy. It also requires a much smaller amount of communication than other common baseline approaches such as the DOWNPOUR method.

We then investigate the limit in speedup of the initial and the asymptotic phase of the mini-batch SGD , the momentum SGD , and the EASGD methods. We find that the spread of the input data distribution has a big impact on their initial convergence rate and stability region. We also find a surprising connection between the momentum SGD and the EASGD method with a negative moving average rate. A non-convex case is also studied to understand when EASGD can get trapped by a saddle point.

Finally, we scale up the EASGD method by using a tree structured network topology. We show empirically its advantage and challenge. We also establish a connection between the EASGD and the DOWNPOUR method with the classical Jacobi and the Gauss-Seidel method, thus unifying a class of distributed stochastic optimization methods.

(See https://arxiv.org/abs/1605.02216)

This talk is part of the Machine Learning @ CUED series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Distributed stochastic optimization for deep learning

This talk is included in these lists:

Other lists

Other talks