Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

High-dimensional dynamics of generalization error in neural networks: implications for experience replay

Add to your list(s) Download to your calendar using vCal

Dr. Andrew Saxe, University of Oxford
Friday 01 February 2019, 12:30-13:30
Cambridge University Engineering Department, CBL, BE4-38 (http://learning.eng.cam.ac.uk/Public/Directions).

If you have a question about this talk, please contact Alberto Bernacchia.

Learning even a simple task can engage huge numbers of neurons across the cortical hierarchy. How do neuronal networks manage to generalize from a small number of examples, despite having large numbers of tunable synapses? And how does depth—the serial propagation of signals through layered structure—impact a learning system? I will describe results emerging from the analysis of deep linear neural networks. Deep linear networks are a simple model class that retain many features of the full nonlinear setting, including a nonconvex error surface and nonlinear learning trajectories. In this talk I will focus on their generalization error, using random matrix theory to analyze the cognitively-relevant “high-dimensional” regime, where the number of training examples is on the order of or even less than the number of adjustable synapses. Consistent with the striking performance of very large deep network models in practice, I show that good generalization is possible in overcomplete networks due to implicit regularization in the dynamics of gradient descent. Overtraining is worst at intermediate network sizes, when the effective number of free parameters equals the number of samples, and can be reduced by making a network smaller or larger. I identify two novel phenomena underlying this behavior in linear networks: first, there is a frozen subspace of the weights in which no learning occurs under gradient descent; and second, the statistical properties of the high-dimensional regime yield better-conditioned input correlations which protect against overtraining. Turning to the impact of depth, the theory reveals a trade-off between training speed and generalization performance in deep neural networks, and I confirm this speed-accuracy trade-off through simulations. Finally, I will describe an application of these results to experience replay during sleep. The consolidation of learning during sleep is thought to arise from the replay of stored experiences between hippocampus and neocortex. Why is this complex strategy beneficial? As a simple model of this process, we compare the dynamics arising from online learning, in which each example is used once and discarded; and batch learning, in which all examples are stored (for instance, in hippocampus) and replayed repeatedly (for instance, during sleep). While these two strategies yield similar performance when training experience is abundant, we find that replay can be decisively better when training experience is scarce. Our results suggest a normative explanation for a two-stage memory system: replay can enable better generalization from limited training experience.

This talk is part of the Computational Neuroscience series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

High-dimensional dynamics of generalization error in neural networks: implications for experience replay

This talk is included in these lists:

Other lists

Other talks