BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//talks.cam.ac.uk//v3//EN
BEGIN:VTIMEZONE
TZID:Europe/London
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:19700329T010000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:19701025T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
CATEGORIES:Statistics
SUMMARY:On momentum methods and acceleration in stochastic
optimization - Praneeth Netrapalli (Microsoft Res
earch India)
DTSTART;TZID=Europe/London:20180521T160000
DTEND;TZID=Europe/London:20180521T170000
UID:TALK102841AThttp://talks.cam.ac.uk
URL:http://talks.cam.ac.uk/talk/index/102841
DESCRIPTION:It is well known that momentum gradient methods (e
.g.\, Polyak's heavy ball\, Nesterov's acceleratio
n) yield significant improvements over vanilla gra
dient descent in deterministic optimization (i.e.\
, where we have access to exact gradient of the fu
nction to be minimized). However\, there is widesp
read sentiment that these momentum methods are not
effective for the purposes of stochastic optimiza
tion due to their instability and error accumulati
on. Numerous works have attempted to quantify thes
e instabilities in the face of either statistical
or non-statistical errors (Paige\, 1971\; Proakis\
, 1974\; Polyak\, 1987\; Greenbaum\, 1989\; Roy an
d Shynk\, 1990\; Sharma et al.\, 1998\; dâ€™Aspremon
t\, 2008\; Devolder et al.\, 2013\, 2014\; Yuan et
al.\, 2016) but a precise understanding is lackin
g. This work considers these issues for the specia
l case of stochastic approximation for the linear
least squares regression problem\, and shows that:
\n\n1. classical momentum methods (heavy ball and
Nesterov's acceleration) indeed do not offer any i
mprovement over stochastic gradient descent\, and\
n\n2. introduces an accelerated stochatic gradient
method that provably achieves the minimax optimal
statistical risk faster than stochastic gradient
descent (and classical momentum methods).\n\nCriti
cal to the analysis is a sharp characterization of
accelerated stochastic gradient descent as a stoc
hastic process. While the results are rigorously e
stablished for the special case of linear least sq
uares regression\, experiments suggest that the co
nclusions hold for the training of deep neural net
works.\n\nJoint work with Prateek Jain\, Sham M. K
akade\, Rahul Kidambi and Aaron Sidford
LOCATION:MR14
CONTACT:Quentin Berthet
END:VEVENT
END:VCALENDAR