BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//talks.cam.ac.uk//v3//EN
BEGIN:VTIMEZONE
TZID:Europe/London
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:19700329T010000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:19701025T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
CATEGORIES:AI+Pizza
SUMMARY:AI + Pizza October 2018 - Microsft Research Cambri
 dge/University of Cambridge
DTSTART;TZID=Europe/London:20181026T173000
DTEND;TZID=Europe/London:20181026T190000
UID:TALK112918AThttp://talks.cam.ac.uk
URL:http://talks.cam.ac.uk/talk/index/112918
DESCRIPTION:*Speaker 1* - Marton Havasi \n\n*Title* - Minimal 
 Random Code Learning: Getting Bits Back from Compr
 essed Model Parameters\n\n*Abstract* - While deep 
 neural networks are a highly successful model clas
 s\, their large memory footprint puts considerable
  strain on energy consumption\, communication band
 width\, and storage requirements.\nConsequently\, 
 model size reduction has become an utmost goal in 
 deep learning. A typical approach is to train a se
 t of deterministic weights\, while applying certai
 n techniques such as pruning and quantization\, in
  \norder that the empirical weight distribution be
 comes amenable to Shannon-style coding schemes. Ho
 wever\, as shown in this paper\, relaxing weight d
 eterminism and using a full variational distributi
 on over weights allows \nfor more efficient coding
  schemes and consequently higher compression rates
 . In particular\, following the classical bits-bac
 k argument\, we encode the network weights using a
  random sample\, requiring only a number of bits \
 ncorresponding to the Kullback-Leibler divergence 
 between the sampled variational distribution and t
 he encoding distribution. By imposing a constraint
  on the Kullback-Leibler divergence\, we are able 
 to explicitly control the compression rate\, \nwhi
 le optimizing the expected loss on the training se
 t. The employed encoding scheme can be shown to be
  close to the optimal information-theoretical lowe
 r bound\, with respect to the employed variational
  family. Our method sets new \nstate-of-the-art in
  neural network compression\, as it strictly domin
 ates previous approaches in a Pareto sense: On the
  benchmarks LeNet-5/MNIST and VGG-16/CIFAR-10\, ou
 r approach yields the best test performance for a 
 fixed memory budget\, and \nvice versa\, it achiev
 es the highest compression rates for a fixed test 
 performance.\nJoint work with Robert Peharz and Jo
 s\\'e Miguel Hern\\'andez-Lobato\n\n\n*Speaker 2* 
 - Patrick Fernandes\n\n*Title* - Structured Neural
  Summarization\n\n*Abstract* - Summarization of lo
 ng sequences into a concise statement is a core pr
 oblem in natural language processing\, requiring n
 on-trivial understanding of the input. Based on th
 e promising results of graph neural networks on hi
 ghly structured data\, we develop a framework to e
 xtend existing sequence encoders with a graph comp
 onent that can reason about long-distance relation
 ships in weakly structured data such as text. In a
 n extensive evaluation\, we show that the resultin
 g hybrid sequence-graph models outperform both pur
 e sequence models as well as pure graph models on 
 a range of summarization tasks.
LOCATION:Auditorium\, Microsoft Research Ltd\, 21 Station R
 oad\, Cambridge\, CB1 2FB
CONTACT:Microsoft Research Cambridge Talks Admins
END:VEVENT
END:VCALENDAR