BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//talks.cam.ac.uk//v3//EN
BEGIN:VTIMEZONE
TZID:Europe/London
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:19700329T010000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:19701025T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
CATEGORIES:NLIP Seminar Series
SUMMARY:A Mutual Information Maximization Perspective of L
 anguage Representation Learning - Lingpeng Kong (D
 eepMind)
DTSTART;TZID=Europe/London:20191101T120000
DTEND;TZID=Europe/London:20191101T130000
UID:TALK128497AThttp://talks.cam.ac.uk
URL:http://talks.cam.ac.uk/talk/index/128497
DESCRIPTION:In this talk\, we show state-of-the-art word repre
 sentation learning methods maximize an objective f
 unction that is a lower bound on the mutual inform
 ation between different parts of a word sequence (
 i.e.\, a sentence). Our formulation provides an al
 ternative perspective that unifies classical word 
 embedding models (e.g.\, Skip-gram) and modern con
 textual embeddings (e.g.\, BERT\, XLNet). In addit
 ion to enhancing our theoretical understanding of 
 these methods\, our derivation leads to a principl
 ed framework that can be used to construct new sel
 f-supervised tasks. We provide an example by drawi
 ng inspirations from related methods based on mutu
 al information maximization that have been success
 ful in computer vision\, and introduce a simple se
 lf-supervised objective that maximizes the mutual 
 information between a global sentence representati
 on and n-grams in the sentence. Our analysis offer
 s a holistic view of representation learning metho
 ds to transfer knowledge and translate progress ac
 ross multiple domains (e.g.\, natural language pro
 cessing\, computer vision\, audio processing).
LOCATION:FW26\, Computer Laboratory
CONTACT:James Thorne
END:VEVENT
END:VCALENDAR
