BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//talks.cam.ac.uk//v3//EN
BEGIN:VTIMEZONE
TZID:Europe/London
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:19700329T010000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:19701025T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
CATEGORIES:ML@CL Seminar Series
SUMMARY:A mean-field theory of lazy training in two-layer
neural nets: entropic regularization and controlle
d McKean-Vlasov dynamics - Maxim Raginsky - Unive
rsity of Illinois at Urbana-Champaign
DTSTART;TZID=Europe/London:20210713T150000
DTEND;TZID=Europe/London:20210713T160000
UID:TALK161365AThttp://talks.cam.ac.uk
URL:http://talks.cam.ac.uk/talk/index/161365
DESCRIPTION:*Paper:*\n\nTalk is based on "this":https://arxiv.
org/abs/2002.01987 paper.\n\n*Abstract:*\n\nWe con
sider the problem of universal approximation of fu
nctions by two-layer neural nets with random weigh
ts that are "nearly Gaussian" in the sense of Kull
back-Leibler divergence. This problem is motivated
by recent works on lazy training\, where the weig
ht updates generated by stochastic gradient descen
t do not move appreciably from the i.i.d. Gaussian
initialization. We first consider the mean-field
limit\, where the finite population of neurons in
the hidden layer is replaced by a continual ensemb
le\, and show that our problem can be phrased as g
lobal minimization of a free-energy functional on
the space of probability measures over the weights
. This functional trades off the L2 approximation
risk against the KL divergence with respect to a c
entered Gaussian prior. We characterize the unique
global minimizer and then construct a controlled
nonlinear dynamics in the space of probability mea
sures over weights that solves a McKean--Vlasov op
timal control problem. This control problem is clo
sely related to the SchrÃ¶dinger bridge (or entropi
c optimal transport) problem\, and its value is pr
oportional to the minimum of the free energy. Fina
lly\, we show that SGD in the lazy training regime
(which can be ensured by jointly tuning the varia
nce of the Gaussian prior and the entropic regular
ization parameter) serves as a greedy approximatio
n to the optimal McKean--Vlasov distributional dyn
amics and provide quantitative guarantees on the L
2 approximation error.\n\n*Website:* https://maxim
.ece.illinois.edu/\n\nPart of ML@CL Seminar Series
in topics relevant to machine learning and statis
tics.\n\nJoin Zoom Meeting\nhttps://us02web.zoom.u
s/j/4784296471?pwd=SE8vc1BvWldlZnc1YUNwK3Q1dHpodz0
9\nMeeting ID: 478 429 6471\nPasscode: 345261
LOCATION:https://us02web.zoom.us/j/4784296471?pwd=SE8vc1BvW
ldlZnc1YUNwK3Q1dHpodz09
CONTACT:Francisco Vargas
END:VEVENT
END:VCALENDAR