BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//talks.cam.ac.uk//v3//EN
BEGIN:VTIMEZONE
TZID:Europe/London
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:19700329T010000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:19701025T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
CATEGORIES:AI+Pizza
SUMMARY:AI + Pizza September 2018 - Adria Garriga Alonso\,
University of Cambridge\, Alex Gaunt\, Microsoft
Research Cambridge
DTSTART;TZID=Europe/London:20180928T173000
DTEND;TZID=Europe/London:20180928T190000
UID:TALK111097AThttp://talks.cam.ac.uk
URL:http://talks.cam.ac.uk/talk/index/111097
DESCRIPTION:*Speaker one* - Adria Garriga Alonso\n \n*Title* -
Deep Convolutional Networks as shallow Gaussian P
rocesses \n\n*Abstract* - We show that the output
of a (residual) convolutional neural network (CNN)
with an appropriate prior over the weights and bi
ases is a Gaussian process (GP) in the limit of in
finitely many convolutional filters. The result is
an extension of the theorem for dense networks du
e to Alex Matthews et al. (2018)\, also presented
in this AI+Pizza series.\n\nFor a CNN\, the equiva
lent kernel can be computed exactly and\, unlike "
deep kernels"\, has very few parameters: only the
hyperparameters of the original CNN. Further\, we
show that this kernel has two properties that allo
w it to be computed efficiently\; the cost of eval
uating the kernel for a pair of images is similar
to a single forward pass through the original CNN
with only one filter per layer.The kernel equivale
nt to a 32-layer ResNet obtains 0.84% classificati
on error on MNIST\, a new record for GPs with a co
mparable number of parameters.\n\nThis is joint wo
rk with Laurence Aitchison and Carl Rasmussen. \n\
n*Speaker Two* - Alexander Gaunt\n\n*Title* - Fixi
ng Variational Bayes: Deterministic Variational In
ference for Bayesian Neural Networks \n\n*Abstract
* - Bayesian neural networks hold great promise as
flexible and principled solution to deal with unc
ertainty when learning from finite data. Among app
roaches to realize probabilistic inference in deep
neural networks\, variational Bayes (VB) is princ
ipled\, generally applicable\, and computationally
efficient. With wide recognition of potential adv
antages\, why is it that variational Bayes has see
n very limited practical use for neural networks i
n real applications? We argue that variational inf
erence in neural networks is fragile: to get the a
pproach to work requires careful initialization an
d tuning of prior variances as well as controlling
the variance of stochastic gradient estimates. We
fix VB and turn it into a robust inference tool f
or Bayesian neural networks. We achieve this by tw
o innovations: first\, we introduce a novel determ
inistic method to approximate moments in neural ne
tworks\, reducing gradient variance to zero\; seco
nd\, we introduce a hierarchical prior for paramet
ers and a novel Empirical Bayes procedure for auto
matically selecting prior variances. Combining the
se two innovations\, the resulting method is highl
y efficient and robust. On the application of hete
roscedastic regression we demonstrate strong predi
ctive performance over alternative approaches. \n
LOCATION:Auditorium\, Microsoft Research Ltd\, 21 Station R
oad\, Cambridge\, CB1 2FB
CONTACT:Microsoft Research Cambridge Talks Admins
END:VEVENT
END:VCALENDAR