University of Cambridge > Talks.cam > Machine Learning Journal Club > reading-group: Interpolating Between Types and Tokens by Estimating Power-Law Generators

reading-group: Interpolating Between Types and Tokens by Estimating Power-Law Generators

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact David MacKay.

http://cog.brown.edu/~gruffydd/papers/typetoken.pdf

Paper-abstract: Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce power-laws, augmenting standard generative models with an adaptor that produces the appropriate pattern of token frequencies. We show that taking a particular stochastic process the Pitman-Yor process as an adaptor justifies the appearance of type frequencies in formal analyses of natural language, and improves the performance of a model for unsupervised learning of morphology.

This talk is part of the Machine Learning Journal Club series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity