University of Cambridge > > Applied and Computational Analysis > How to climb a billion dimensional hill using a coin and a compass and count the steps before departure

How to climb a billion dimensional hill using a coin and a compass and count the steps before departure

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Carola-Bibiane Schoenlieb.

SUBTITLE : Parallel block coordinate descent methods for huge-scale partially separable problems

With growing digitization of the world it is increasingly easier to collect monstrous amounts of data. Often, this data is analyzed using an optimization algorithm, and leads to difficult huge-scale problems with millions or billions of variables.

Existing optimization algorithms, which are perfectly suited for solving problems of medium size, such as polynomial-time interior-point methods, are often not useful in this new setting due to the bad dependence of their complexity on the problem dimension. Hence, there is a pressing need to devise and analyze new methods, or adapt classical methods, that would be capable of working in the emerging huge-scale setting. An entirely different approach to the scale problem is via acceleration of existing methods on parallel computing architectures such as many-core computers and clusters thereof, or systems based on graphical processing units (GPUs).

In this talk we describe a new method that combines the two approaches outlined above. Our method has both i) a good dependence on the problem dimension and b) is parallel in nature, and hence is well-suited for solving certain structured optimization problems of huge dimension. In particular, we show that randomized block coordinate descent methods, such as those developed in [1, 2], can be accelerated by parallelization when applied to the problem of minimizing the sum of a partially block separable smooth convex function and a simple block separable convex function.

Many problems of current interest in diverse communities (statistics, optimal experimental design, machine learning, mechanical engineering), can be cast in this form, including least-squares, L1 regularized least-squares (LASSO), group and sparse group LASSO , computing c and A-optimal designs of statistical experiments, training (sparse) linear support vector machines (SVM) and truss topology design (TTD).

We describe a generic parallel randomized block coordinate descent algorithm (PR-BCD) and several variants thereof based on the way parallelization is performed. In all cases we prove iteration complexity results, i.e., we give bounds on the number of iterations sufficient to approximately solve the problem with high probability. Our results generalize the intuitive observation that in the separable case the theoretical speedup caused by parallelization should be equal to the number of processors. We show that the speedup increases with the number of processors and with the degree of partial separability of the smooth component of the objective function. Our analysis also works in the mode when the number of blocks being updated at each iteration is random, which allows for modelling situations with variable (busy or unreliable) number of processors.

We conclude with some encouraging computational results applied to huge-scale LASSO , SVM and TTD instances.

All results are based on joint work with Martin Takac (Edinburgh)

[1] Yurii Nesterov. Efficiency of coordinate descent methods on huge-scale optimization problems. CORE Discussion Paper #2010/2, February 2010.

[2] Peter Richtarik and Martin Takac. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. After 1st revision in Mathematical Programming A, 2011.

This talk is part of the Applied and Computational Analysis series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2024, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity