BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:AI + Pizza October 2018 - Microsft Research Cambridge/University o
 f Cambridge
DTSTART:20181026T163000Z
DTEND:20181026T180000Z
UID:TALK112918@talks.cam.ac.uk
CONTACT:Microsoft Research Cambridge Talks Admins
DESCRIPTION:*Speaker 1* - Marton Havasi \n\n*Title* - Minimal Random Code 
 Learning: Getting Bits Back from Compressed Model Parameters\n\n*Abstract*
  - While deep neural networks are a highly successful model class\, their 
 large memory footprint puts considerable strain on energy consumption\, co
 mmunication bandwidth\, and storage requirements.\nConsequently\, model si
 ze reduction has become an utmost goal in deep learning. A typical approac
 h is to train a set of deterministic weights\, while applying certain tech
 niques such as pruning and quantization\, in \norder that the empirical we
 ight distribution becomes amenable to Shannon-style coding schemes. Howeve
 r\, as shown in this paper\, relaxing weight determinism and using a full 
 variational distribution over weights allows \nfor more efficient coding s
 chemes and consequently higher compression rates. In particular\, followin
 g the classical bits-back argument\, we encode the network weights using a
  random sample\, requiring only a number of bits \ncorresponding to the Ku
 llback-Leibler divergence between the sampled variational distribution and
  the encoding distribution. By imposing a constraint on the Kullback-Leibl
 er divergence\, we are able to explicitly control the compression rate\, \
 nwhile optimizing the expected loss on the training set. The employed enco
 ding scheme can be shown to be close to the optimal information-theoretica
 l lower bound\, with respect to the employed variational family. Our metho
 d sets new \nstate-of-the-art in neural network compression\, as it strict
 ly dominates previous approaches in a Pareto sense: On the benchmarks LeNe
 t-5/MNIST and VGG-16/CIFAR-10\, our approach yields the best test performa
 nce for a fixed memory budget\, and \nvice versa\, it achieves the highest
  compression rates for a fixed test performance.\nJoint work with Robert P
 eharz and Jos\\'e Miguel Hern\\'andez-Lobato\n\n\n*Speaker 2* - Patrick Fe
 rnandes\n\n*Title* - Structured Neural Summarization\n\n*Abstract* - Summa
 rization of long sequences into a concise statement is a core problem in n
 atural language processing\, requiring non-trivial understanding of the in
 put. Based on the promising results of graph neural networks on highly str
 uctured data\, we develop a framework to extend existing sequence encoders
  with a graph component that can reason about long-distance relationships 
 in weakly structured data such as text. In an extensive evaluation\, we sh
 ow that the resulting hybrid sequence-graph models outperform both pure se
 quence models as well as pure graph models on a range of summarization tas
 ks.
LOCATION:Auditorium\, Microsoft Research Ltd\, 21 Station Road\, Cambridge
 \, CB1 2FB
END:VEVENT
END:VCALENDAR
