University of Cambridge > > NLIP Seminar Series > Disfluency detection in spoken learner English

Disfluency detection in spoken learner English

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Tamara Polajnar.

Due to the non-canonical nature of spoken language (containing filled pauses, non-standard grammatical variations, hesitations and other disfluencies) and compounded by a lack of available training data, spoken language parsing has been a challenge for standard NLP tools. Recently the Redshift parser (Honnibal et al., CoNLL 2013) has been shown to be successful in identifying grammatical relations and certain disfluencies in native speaker spoken language, returning unlabelled dependency accuracy of 90.5% and a disfluency F-measure of 84.1% (Honnibal & Johnson, TACL 2014 ). We investigate how this parser handles spoken data from learners of English at various proficiency levels. Firstly, we find that Redshift’s parsing accuracy on non-native speech data is comparable to Honnibal & Johnson’s results, with 91.1% of dependency relations correctly identified. However, disfluency detection is markedly down, with an F-measure of just 47.8%. We consider why this should be, and relate our findings to the use of NLP technology for automatic language assessment and computer-assisted language learning applications.

This talk is part of the NLIP Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2024, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity