COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > Machine Learning Reading Group @ CUED > CBL Alumni Talk: Examining Critiques in Bayesian Deep Learning
CBL Alumni Talk: Examining Critiques in Bayesian Deep LearningAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Elre Oldewage. Approximate inference procedures in Bayesian deep learning have become scalable and practical, often providing better accuracy and calibration than classical training, without significant computational overhead. However, there have emerged several challenges to the Bayesian approach in deep learning. It was found in an empirical study that deep ensembles, formed from re-training an architecture and ensembling the result, outperformed some approaches to approximate Bayesian inference—- which led to the question of whether we should pursue ensembling instead of Bayesian methods in deep learning. It was later observed that several approximate inference approaches appear to raise the posterior to a power 1/T, with T less than 1, leading to a “cold posterior”, which was asserted as being “sharply divergent” with Bayesian principles. In the same paper, the popular Gaussian priors we use in deep learning were questioned as unreasonable, supported by an experiment showing that each sample function from a prior appears to assign nearly all of CIFAR -10 to a particular class. In this talk, we will examine these critiques, and show that (1) deep ensembles provide a better approximation of the Bayesian predictive distribution than the approximate inference procedures considered in the empirical study, and in general are a reasonable approach to approximate inference in deep learning under severe computational constraints; (2) tempering is in fact not typically required, and is also a reasonable procedure in general; (3) the example of prior functions assigning nearly all data to one class can be easily resolved by calibrating the signal variance of the Gaussian prior; (4) Gaussian priors, while imperfect like any prior, induce a prior over functions with many desirable properties when combined with a neural architecture. A theme in this talk is that while we should be careful to scrutinize our modelling procedures, we should also apply the same critical scrutiny to the critiques, leading to a deeper and more nuanced understanding, and more successful practical innovations. Sections 3.2, 3.3, 4-9 of https://arxiv.org/abs/2002.08791 provide good background reading for the talk. This talk is part of the Machine Learning Reading Group @ CUED series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsExam prep Are there too may people? A head-to-head debate on overpopulation The Shrinking Commons Symposium: Plenary LecturesOther talksSymmetry breaking and self-organization in intestinal organoids mTOR signaling in growth and metabolism The landed gentry in British politics after World War II: from taxed decadence to subsidized cultural heritage ONLINE WEBINAR - Vertical Aerospace Matrix models; asymptotics, transesseries, theta functions and all that Understanding Japan’s competitiveness in the global cotton market in the early 20th century’ |