BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:High-dimensional dynamics of generalization error in neural networ
 ks: implications for experience replay - Dr. Andrew Saxe\, University of O
 xford
DTSTART:20190201T123000Z
DTEND:20190201T133000Z
UID:TALK119359@talks.cam.ac.uk
CONTACT:Alberto Bernacchia
DESCRIPTION:Learning even a simple task can engage huge numbers of neurons
  across the cortical hierarchy. How do neuronal networks manage to general
 ize from a small number of examples\, despite having large numbers of tuna
 ble synapses? And how does depth—the serial propagation of signals throu
 gh layered structure—impact a learning system? I will describe results e
 merging from the analysis of deep linear neural networks. Deep linear netw
 orks are a simple model class that retain many features of the full nonlin
 ear setting\, including a nonconvex error surface and nonlinear learning t
 rajectories. In this talk I will focus on their generalization error\, usi
 ng random matrix theory to analyze the cognitively-relevant "high-dimensio
 nal" regime\, where the number of training examples is on the order of or 
 even less than the number of adjustable synapses. Consistent with the stri
 king performance of very large deep network models in practice\, I show th
 at good generalization is possible in overcomplete networks due to implici
 t regularization in the dynamics of gradient descent. Overtraining is wors
 t at intermediate network sizes\, when the effective number of free parame
 ters equals the number of samples\, and can be reduced by making a network
  smaller or larger. I identify two novel phenomena underlying this behavio
 r in linear networks: first\, there is a frozen subspace of the weights in
  which no learning occurs under gradient descent\; and second\, the statis
 tical properties of the high-dimensional regime yield better-conditioned i
 nput correlations which protect against overtraining. Turning to the impac
 t of depth\, the theory reveals a trade-off between training speed and gen
 eralization performance in deep neural networks\, and I confirm this speed
 -accuracy trade-off through simulations. Finally\, I will describe an appl
 ication of these results to experience replay during sleep. The consolidat
 ion of learning during sleep is thought to arise from the replay of stored
  experiences between hippocampus and neocortex. Why is this complex strate
 gy beneficial? As a simple model of this process\, we compare the dynamics
  arising from online learning\, in which each example is used once and dis
 carded\; and batch learning\, in which all examples are stored (for instan
 ce\, in hippocampus) and replayed repeatedly (for instance\, during sleep)
 . While these two strategies yield similar performance when training exper
 ience is abundant\, we find that replay can be decisively better when trai
 ning experience is scarce. Our results suggest a normative explanation for
  a two-stage memory system: replay can enable better generalization from l
 imited training experience.
LOCATION:Cambridge University Engineering Department\, CBL\, BE4-38 (http:
 //learning.eng.cam.ac.uk/Public/Directions)
END:VEVENT
END:VCALENDAR