BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:NLIP reading group: Bayesian Smoothing for Language Models - Andre
 as Vlachos (University of Cambridge)
DTSTART:20120126T120000Z
DTEND:20120126T130000Z
UID:TALK35899@talks.cam.ac.uk
CONTACT:Jimme Jardine
DESCRIPTION:Yee-Whye Teh will be visiting us on 27/1 to give the NLIP semi
 nar. The abstract of his talk is in the bottom of my message.  We were thi
 nking it might useful to read the paper he's going to talk about beforehan
 d:\n\nhttp://www.gatsby.ucl.ac.uk/~ywteh/research/compling/WooGasArc2011a.
 pdf\n\n> Smoothing is a central component of language modelling technologi
 es.\n> It attempts to improve probabilities estimated from language data b
 y \n> shifting mass from high probability areas to low or zero probability
  \n> areas\, thus "smoothing" the distribution.  Many smoothing techniques
  \n> have been proposed in the past based on a variety of principles and \
 n> empirical observations.\n>\n> In this talk I will present a Bayesian st
 atistical approach to smoothing.\n> By using a hierarchical Bayesian metho
 dology to effectively share \n> information across the different parts of 
 the language model\, and by \n> incorporating the prior knowledge that lan
 guages obey power-law \n> behaviours using Pitman-Yor processes\, we are a
 ble to construct \n> language models with state-of-the-art results.  Our a
 pproach also \n> gives an interesting new interpretation of interpolated K
 neser-Ney and why it works so well.\n> Finally\, we describe an extension 
 of our model from finite n-grams to \n> "infinite-grams" which we call the
  sequence memoizer.\n>\n> This is joint work with Frank Wood\, Jan Gasthau
 s\, Cedric Archambeau \n> and Lancelot James\, and is based on work most r
 ecently reported in the \n> Communications of the ACM (Feb 2011 issue).\n
LOCATION:GS15\, Computer Laboratory
END:VEVENT
END:VCALENDAR