University of Cambridge > > Machine Intelligence Laboratory Speech Seminars > Factors Affecting ASR Model Self-Training

Factors Affecting ASR Model Self-Training

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Dr Marcus Tomalin.

Low-resource ASR self-training seeks to minimize resource requirements such as manual transcriptions or language modeling text. This is accomplished by training on large quantities of audio automatically labeled by a small initial model. By analyzing our previous experiments with the conversational telephone English Fisher corpus, we demonstrate where self-training succeeds and under what resource conditions it provides the most benefit. Additionally, we will show success on Spanish and Levantine conversational speech as well as the tougher English Callhome set, despite initial WER of more than 60%. Finally, by digging beneath average word error rate and analyzing individual word performance, we show that self-trained models successfully learn new words. More importantly, self-training benefits most words which appear in the unlabeled audio but do not appear in the manual transcriptions.

This talk is part of the Machine Intelligence Laboratory Speech Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2024, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity