![]() |
University of Cambridge > Talks.cam > Machine learning theory > Learning shallow neural networks in high dimensions: SGD dynamics and scaling laws
Learning shallow neural networks in high dimensions: SGD dynamics and scaling lawsAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Fernando Ruiz Mazo. Abstract: We study the sample and time complexity of online stochastic gradient descent (SGD) in learning a two-layer neural network with M orthogonal neurons on isotropic Gaussian data. We focus on the challenging “extensive-width” regime M≫1 and allow for large condition number in the second-layer parameters, covering the power-law scaling a_m= m^{-β} as a special case. We characterize the SGD dynamics for the training of a student two-layer neural network and identify sharp transition times for the recovery of each signal direction. In the power-law setting, our analysis entails that while the learning of individual teacher neurons exhibits abrupt phase transitions, the juxtaposition of emergent learning curves at different timescales results in a smooth scaling law in the cumulative objective. This talk is co-hosted by the Computer Laboratory AI Research Group and the Informed-AI Hub. This talk is part of the Machine learning theory series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsArmourers and Brasiers Cambridge Forum Criminology DAMTP Jubilee CelebrationOther talksLunch Session II – Can I image it? Afternoon Break Working with Government and Policy Makers War and Prices: Austerity and the Cost of Living Index in Britain (c. 1939-1950) Viral ghosts and specimen hosts: pathogen detection in natural history museum collections |