Talks.cam will close on 1 July 2026, further information is available on the UIS Help Site
 

University of Cambridge > Talks.cam > Machine learning theory >  Learning shallow neural networks in high dimensions: SGD dynamics and scaling laws

Learning shallow neural networks in high dimensions: SGD dynamics and scaling laws

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Fernando Ruiz Mazo.

Abstract: We study the sample and time complexity of online stochastic gradient descent (SGD) in learning a two-layer neural network with M orthogonal neurons on isotropic Gaussian data. We focus on the challenging “extensive-width” regime M≫1 and allow for large condition number in the second-layer parameters, covering the power-law scaling a_m= m^{-β} as a special case. We characterize the SGD dynamics for the training of a student two-layer neural network and identify sharp transition times for the recovery of each signal direction. In the power-law setting, our analysis entails that while the learning of individual teacher neurons exhibits abrupt phase transitions, the juxtaposition of emergent learning curves at different timescales results in a smooth scaling law in the cumulative objective.

This talk is co-hosted by the Computer Laboratory AI Research Group and the Informed-AI Hub.

This talk is part of the Machine learning theory series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2025 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity