reading-group: Interpolating Between Types and Tokens by Estimating Power-Law Generators
Add to your list(s)
Download to your calendar using vCal
If you have a question about this talk, please contact David MacKay.
http://cog.brown.edu/~gruffydd/papers/typetoken.pdf
Paper-abstract:
Standard statistical models of language fail to capture one of the most
striking properties of natural languages: the power-law distribution in
the frequencies of word tokens. We present a framework for developing
statistical models that generically produce power-laws, augmenting standard
generative models with an adaptor that produces the appropriate
pattern of token frequencies. We show that taking a particular stochastic
process the Pitman-Yor process as an adaptor justifies the appearance
of type frequencies in formal analyses of natural language, and improves
the performance of a model for unsupervised learning of morphology.
This talk is part of the Machine Learning Journal Club series.
This talk is included in these lists:
Note that ex-directory lists are not shown.
|