The loss surface of deep neural networks has recently attracted interest in the optimization and machine learning communities as a prime example of high-dimensional non-convex pro blem. Some insights were recently gained using spi n glass models and mean-field approximations\, bu t at the expense of simplifying the nonlinear natu re of the model.

In this work\, we do not make any such assumption and study conditions on the data distribution and model architecture that prevent the existence of bad local minima. We fir st take a topological approach and characterize a bsence of bad local minima by studying the connect edness of the loss surface level sets. Our theoret ical work quantifies and formalizes two important facts: (i) the landscape of deep linear networks has a radically different topology from that of d eep half-rectified ones\, and (ii) that the energy landscape in the non-linear case is fundamentall y controlled by the interplay between the smoothne ss of the data distribution and model over-paramet rization. Our main theoretical contribution is to prove that half-rectified single layer networks ar e asymptotically connected\, and we provide explic it bounds that reveal the aforementioned interplay .

The conditioning of gradient desce nt is the next challenge we address. We study thi s question through the geometry of the level sets\ , and we introduce an algorithm to efficiently est imate the regularity of such sets on large-scale n etworks. Our empirical results show that these le vel sets remain connected throughout all the lear ning phase\, suggesting a near convex behavior\, b ut they become exponentially more curvy as the en ergy level decays\, in accordance to what is obser ved in practice with very low curvature attractors . Joint work with Daniel Freeman (UC Berkeley).&nb sp\; LOCATION:Seminar Room 1\, Newton Institute CONTACT:INI IT END:VEVENT END:VCALENDAR