BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//talks.cam.ac.uk//v3//EN
BEGIN:VTIMEZONE
TZID:Europe/London
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:19700329T010000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:19701025T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
CATEGORIES:Statistics
SUMMARY:[Special Statslab Seminar] Scalable stochastic op
timization and large-scale data - Michael W. Mahon
ey (ICSI and Department of Statistics\, UC Berkele
y)
DTSTART;TZID=Europe/London:20190508T140000
DTEND;TZID=Europe/London:20190508T150000
UID:TALK124819AThttp://talks.cam.ac.uk
URL:http://talks.cam.ac.uk/talk/index/124819
DESCRIPTION:Stochastic optimization is widely-used in many are
as\, most recently in large-scale machine learning
and data science\, but its use in these areas is
quite different than its use in more traditional a
reas of operations research\, scientific computing
\, and statistics. In particular\, second order o
ptimization methods have been ubiquitous historica
lly\, but they are rarely used in machine learning
and data science\, compared to their first order
counterparts. Motivated by well-known problems of
first order methods\, however\, recent work has b
egun to experiment with second order methods for m
achine learning problems. By exploiting recent re
sults from Randomized Numerical Linear Algebra\, w
e establish improved bounds for algorithms that in
corporate sub-sampling as a way to improve computa
tional efficiency\, while maintaining the original
convergence properties of these algorithms. Thes
e results provide quantitative convergence results
for variants of Newton's methods\, where the Hess
ian and/or the gradient is uniformly or non-unifor
mly sub-sampled\, under much weaker assumptions th
an prior work\; and these results include extensio
ns to trust region and cubic regularization algori
thms for non-convex optimization problems. When a
pplied to complex machine learning tasks such as t
raining deep neural networks\, empirical results d
emonstrate that these methods perform quite well\,
both in ways that one would expect (e.g.\, leadin
g to improved conditioning in the presence of so-c
alled exploding/vanishing gradients) as well as in
ways that are more surprising but more interestin
g (e.g.\, using so-called adversarial examples to
architect the objective function surface to be mor
e amenable to optimization algorithms).\n\n\n
LOCATION:MR11\, Centre for Mathematical Sciences\, Wilberfo
rce Road\, Cambridge
CONTACT:HoD Secretary\, DPMMS
END:VEVENT
END:VCALENDAR