University of Cambridge > Talks.cam > Machine Learning Journal Club > Contextual dependencies in unsupervised word segmentation

Contextual dependencies in unsupervised word segmentation

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Philip Sterne.

We will be discussing the paper “Contextual dependencies in unsupervised word segmentation” by Sharon Goldwater, Thomas L. Griffiths and Mark Johnson.

Available from: http://cocosci.berkeley.edu/tom/papers/wordseg1.pdf

Abstract: Developing better methods for segmenting continuous text into words is important for improving the processing of Asian languages, and may shed light on how humans learn to segment speech. We propose two new Bayesian word segmentation methods that assume unigram and bigram models of word dependencies respectively. The bigram model greatly outperforms the unigram model (and previous probabilistic models), demonstrating the importance of such dependencies for word segmentation. We also show that previous probabilistic models rely crucially on sub-optimal search procedures.

This talk is part of the Machine Learning Journal Club series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity