BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//talks.cam.ac.uk//v3//EN
BEGIN:VTIMEZONE
TZID:Europe/London
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:19700329T010000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:19701025T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
CATEGORIES:Statistics
SUMMARY:Approximate Cross Validation for Large Data and Hi
gh Dimensions - Tamara Broderick\, Massachusetts I
nstitute of Technology
DTSTART;TZID=Europe/London:20200117T140000
DTEND;TZID=Europe/London:20200117T150000
UID:TALK136060AThttp://talks.cam.ac.uk
URL:http://talks.cam.ac.uk/talk/index/136060
DESCRIPTION:The error or variability of statistical and machin
e learning algorithms is often assessed by repeate
dly re-fitting a model with different weighted ver
sions of the observed data. The ubiquitous tools o
f cross-validation (CV) and the bootstrap are exam
ples of this technique. These methods are powerful
in large part due to their model agnosticism but
can be slow to run on modern\, large data sets due
to the need to repeatedly re-fit the model. We us
e a linear approximation to the dependence of the
fitting procedure on the weights\, producing resul
ts that can be faster than repeated re-fitting by
orders of\nmagnitude. This linear approximation is
sometimes known as the "infinitesimal jackknife"
(IJ) in the statistics literature\, where it has m
ostly been used as a theoretical tool to prove asy
mptotic results. We provide explicit finite-sample
error bounds for the infinitesimal jackknife in t
erms of a small number of simple\, verifiable assu
mptions. Without further modification\, though\, w
e note that the IJ deteriorates in accuracy in hig
h dimensions and incurs a running time roughly cub
ic in dimension. We additionally show\, then\, how
dimensionality reduction can be used to successfu
lly run the IJ in high dimensions in the case of l
eave-one-out cross validation (LOOCV). Specificall
y\, we consider L1 regularization for generalized
linear models. We prove that\, under mild conditio
ns\, the resulting LOOCV approximation exhibits co
mputation time and accuracy that depend on the rec
overed support size rather than the full dimension
D. Simulated and real-data experiments support ou
r theory.
LOCATION:MR12
CONTACT:Dr Sergio Bacallado
END:VEVENT
END:VCALENDAR