BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//talks.cam.ac.uk//v3//EN
BEGIN:VTIMEZONE
TZID:Europe/London
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:19700329T010000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:19701025T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
CATEGORIES:Machine Learning Reading Group @ CUED
SUMMARY:Herding or a '3rd way to learn' - Simon Lacoste-Ju
lien and Ferenc Huszar
DTSTART;TZID=Europe/London:20101021T140000
DTEND;TZID=Europe/London:20101021T153000
UID:TALK26203AThttp://talks.cam.ac.uk
URL:http://talks.cam.ac.uk/talk/index/26203
DESCRIPTION:In this RCC\, we will cover a new learning approac
h quite recently proposed (2009) by Max Welling an
d peculiarly called "Herding". Even though this is
brand new research\, it contains interesting link
s with\nstandard machine learning concepts that we
will review. Also\, we will split the RCC in two
parts which will show the interesting evolution ty
pical in research from the original idea (with the
first paper) to a more modern viewpoint (with its
latest paper).\n\n"Herding" is a novel approach t
o learning which seems a bit peculiar at first sig
ht but has interesting properties and actually goo
d empirical performance. It can be seen as a '3rd
way to learn': the first one being traditional fre
quentist point estimates of parameters and the sec
ond using Bayesian posterior over parameters. Herd
ing lies somewhat in between: rather than maintain
ing samples over parameters as in Bayesian learnin
g\, it navigates the parameter space in a determin
istic (but almost chaotic) way which don't converg
e to any particular point estimate. During this ex
ploration\, it produces pseudosamples which can be
used in a similar fashion one would use samples f
rom a MCMC method *after* learning. The two main a
dvantages of herding are that:\n# it is computatio
nally cheaper than traditional learning of Markov
Random Field by learning & doing inference in one
deterministic step\; \n# the resulting pseudo-samp
les exhibit strong anticorrelations\, thereby prov
iding a more uniform coverage of the space -- this
means that the number of (pseudo)samples needed t
o approximate relevant integrals is substantially
smaller than in the case of random sampling (namel
y\, has O(1/T) convergence vs. O(1/sqrt(T)) for ii
d sampling).\n\nIn the first half of the RCC\, Sim
on will cover the original herding paper:\n* _Herd
ing Dynamical Weights to Learn_\, Max Welling\, IC
ML 2009 - "pdf":http://www.cs.mcgill.ca/~icml2009/
papers/447.pdf

\nwhich introduced herding as a
n algorithm for learning/sampling in the context o
f Markov random fields (and exhibits property 1. m
entioned\nabove). The first part therefore concent
rates on links with previous approaches to learnin
g under Markov random fields.

\n[Note that if
you are interested enough in the topic to read ver
y carefully this paper\, there is a correction in
the proof of recurrence "here":http://www.ics.uci.
edu/~welling/publications/papers/Correction%20to%2
0Recurrence%20Proof%20for%20Herding.pdf .]\n\nIn t
he second half of the RCC\, Ferenc will concentrat
e on the most recent paper:\n* _Super-Samples from
Kernel Herding_\, Yutian Chen\, Max Welling\, Ale
x\nSmola\, UAI 2010 - "pdf":event.cwi.nl/uai2010/p
apers/UAI2010_0238.pdf

\nin which kernel herd
ing is introduced and present a more modern viewpo
int on herding in the context of approximating dis
tributions (and where property 2. mentioned above
is exploited). Herding here can be understood as a
greedy optimisation for approximating a probabili
ty distribution with the empirical distribution of
pseudosamples in such a way that certain nonlinea
r moments of the original distribution are preserv
ed. Connections to message passing\, Monte Carlo\,
and other relevant methods will be discussed.
LOCATION:Engineering Department\, CBL Room 438
CONTACT:Shakir Mohamed
END:VEVENT
END:VCALENDAR