Bayesian Smoothing for Language Models
- đ¤ Speaker: Yee Whye Teh, University College London
- đ Date & Time: Friday 27 January 2012, 12:00 - 13:00
- đ Venue: FW26, Computer Laboratory
Abstract
Smoothing is a central component of language modelling technologies. It attempts to improve probabilities estimated from language data by shifting mass from high probability areas to low or zero probability areas, thus “smoothing” the distribution. Many smoothing techniques have been proposed in the past based on a variety of principles and empirical observations.
In this talk I will present a Bayesian statistical approach to smoothing. By using a hierarchical Bayesian methodology to effectively share information across the different parts of the language model, and by incorporating the prior knowledge that languages obey power-law behaviours using Pitman-Yor processes, we are able to construct language models with state-of-the-art results. Our approach also gives an interesting new interpretation of interpolated Kneser-Ney and why it works so well. Finally, we describe an extension of our model from finite n-grams to “infinite-grams” which we call the sequence memoizer.
This is joint work with Frank Wood, Jan Gasthaus, Cedric Archambeau and Lancelot James, and is based on work most recently reported in the Communications of the ACM (Feb 2011 issue).
Series This talk is part of the NLIP Seminar Series series.
Included in Lists
- All Talks (aka the CURE list)
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- Computer Education Research
- Computing Education Research
- Department of Computer Science and Technology talks and seminars
- FW26, Computer Laboratory
- Graduate-Seminars
- Guy Emerson's list
- Interested Talks
- Language Sciences for Graduate Students
- ndk22's list
- NLIP Seminar Series
- ob366-ai4er
- PMRFPS's
- rp587
- School of Technology
- Simon Baker's List
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Yee Whye Teh, University College London
Friday 27 January 2012, 12:00-13:00