Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Convex and non-convex worlds in machine learning

Add to your list(s) Download to your calendar using vCal

Anna Choromanska (New York University)
Wednesday 01 July 2015, 11:00-12:00
Engineering Department, CBL Room BE-438.

If you have a question about this talk, please contact Dr Jes Frellsen.

Title: Convex and non-convex worlds in machine learning

Abstract:

The talk will focus on the modern challenges in machine learning: designing good and efficient problem-specific solvers, designing good problem-specific objectives and building understanding of non-convex deep learning optimization. In machine learning there is a plethora of approaches when convexity is desired to solve a given problem due to the existence of unique global minimum. Convex problems give rise to theoretical guarantees and can typically be efficiently solved. In the first part of the talk an example of recently developed convex approach will be discussed, which come with strong theoretical guarantees, where learning is done via reduction to convex problem. First, we show the construction of a new solver for the partition function-based optimization which reduces the problem to quadratic optimization. Various applications of this variational bound will be discussed. The experimental results will show advantages of the proposed method over state-of-the-art optimization techniques and furthermore will run counter to the conventional wisdom that machine learning problems are best handled via generic optimization tools. The next part of the talk will extend the previous setting by showing how to use efficient solvers to more general class of problems. The talk will focus on the multi-class setting. A reduction of this problem to a set of binary classification problems organized in a tree structure will be discussed and a new top-down criterion for purification of labels will be presented which guarantees train and test running times that are logarithmic in the label complexity.

Discussed approaches either live in the world of convex optimization and/or come with theoretical guarantees. Despite the success of convex methods, deep learning methods, where the objective is inherently highly non-convex, have enjoyed a resurgence of interest in the last few years and they achieve state-of-the-art performance. In the last part of the talk we move to the world of non-convex optimization where recent findings suggest that we might eventually be able to describe these approaches theoretically. The connection between the highly non-convex loss function of a simple model of the fully-connected feed-forward neural network and the Hamiltonian of the spherical spin-glass model will be established. It will be shown that i) for large-size networks, most local minima are equivalent and yield similar performance on a test set, (ii) the probability of finding a “bad” (high value) local minimum is non-zero for small-size networks and decreases quickly with network size, (iii) struggling to find the global minimum on the training set (as opposed to one of the many good local ones) is not useful in practice and may lead to overfitting.

Bio:

Anna Choromanska is a Post-Doctoral Associate in the Computer Science Department at Courant Institute of Mathematical Sciences, New York University. She is working in the Computational and Biological Learning Lab, which is a part of Computational Intelligence, Learning, Vision, and Robotics Lab, of prof. Yann LeCun. She graduated with her PhD from Columbia University, Department of Electrical Engineering, where she was the The Fu Foundation School of Engineering and Applied Science Presidential Fellowship holder. She was advised by prof. Tony Jebara. She completed her MSc with distinctions in the Department of Electronics and Information Technology, Warsaw University of Technology with double specialization, Electronics and Computer Engineering and Electronics and Informatics in Medicine. She was working with various industrial institutions, including AT&T Shannon Research Laboratories, IBM T .J. Watson Reseatch Center and Microsoft Research New York. Her research interests are in machine learning, optimization and statistics with applications in biomedicine and neurobiology. She also holds a music degree from Mieczyslaw Karlowicz Music School in Warsaw, Department of Piano Play. She is an avid salsa dancer performing with the Ache Performance Group. Her other hobbies is painting and photography.

This talk is part of the Machine Learning @ CUED series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Convex and non-convex worlds in machine learning

This talk is included in these lists:

Other lists

Other talks