BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//talks.cam.ac.uk//v3//EN
BEGIN:VTIMEZONE
TZID:Europe/London
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:19700329T010000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:19701025T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
CATEGORIES:Machine Learning Reading Group @ CUED
SUMMARY:CBL Alumni Talk: Examining Critiques in Bayesian D
eep Learning - Andrew Gordon Wilson
DTSTART;TZID=Europe/London:20210416T160000
DTEND;TZID=Europe/London:20210416T170000
UID:TALK158818AThttp://talks.cam.ac.uk
URL:http://talks.cam.ac.uk/talk/index/158818
DESCRIPTION:Approximate inference procedures in Bayesian deep
learning have become scalable and practical\, ofte
n providing better accuracy and calibration than c
lassical training\, without significant computatio
nal overhead. However\, there have emerged several
challenges to the Bayesian approach in deep learn
ing. It was found in an empirical study that deep
ensembles\, formed from re-training an architectur
e and ensembling the result\, outperformed some ap
proaches to approximate Bayesian inference --- whi
ch led to the question of whether we should pursue
ensembling instead of Bayesian methods in deep le
arning. It was later observed that several approxi
mate inference approaches appear to raise the post
erior to a power 1/T\, with T less than 1\, leadin
g to a “cold posterior”\, which was asserted as be
ing "sharply divergent" with Bayesian principles.
In the same paper\, the popular Gaussian priors we
use in deep learning were questioned as unreasona
ble\, supported by an experiment showing that each
sample function from a prior appears to assign ne
arly all of CIFAR-10 to a particular class.\n\nIn
this talk\, we will examine these critiques\, and
show that (1) deep ensembles provide a better appr
oximation of the Bayesian predictive distribution
than the approximate inference procedures consider
ed in the empirical study\, and in general are a r
easonable approach to approximate inference in dee
p learning under severe computational constraints\
; (2) tempering is in fact not typically required\
, and is also a reasonable procedure in general\;
(3) the example of prior functions assigning nearl
y all data to one class can be easily resolved by
calibrating the signal variance of the Gaussian pr
ior\; (4) Gaussian priors\, while imperfect like a
ny prior\, induce a prior over functions with many
desirable properties when combined with a neural
architecture.\n\nA theme in this talk is that whil
e we should be careful to scrutinize our modelling
procedures\, we should also apply the same critic
al scrutiny to the critiques\, leading to a deeper
and more nuanced understanding\, and more success
ful practical innovations.\n\nSections 3.2\, 3.3\,
4-9 of https://arxiv.org/abs/2002.08791 provide g
ood background reading for the talk.
LOCATION: https://eng-cam.zoom.us/j/82969702755?pwd=L0dIVnl
wSHJHV2NGbUQ1cmxpYjIyUT09
CONTACT:Elre Oldewage
END:VEVENT
END:VCALENDAR