Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

A Hierarchical Bayesian Language Model based on Pitman-Yor Processes

Add to your list(s) Download to your calendar using vCal

Hanna Wallach, University of Cambridge
Thursday 24 August 2006, 10:00-11:00
Room 911, Rutherford Building, Cavendish Laboratory, Department of Physics.

If you have a question about this talk, please contact Hanna Wallach.

Paper (also Tech. report)

We propose a new hierarchical Bayesian n-gram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called Pitman-Yor processes which produce power-law distributions more closely resembling those in natural languages. We show that an approximation to the hierarchical Pitman-Yor language model recovers the exact formulation of interpolated Kneser-Ney, one of the best smoothing methods for n-gram language models. Experiments verify that our model gives cross entropy results superior to interpolated Kneser-Ney and comparable to modified Kneser-Ney.

This talk is part of the Machine Learning Journal Club series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

A Hierarchical Bayesian Language Model based on Pitman-Yor Processes

This talk is included in these lists:

Other lists

Other talks