Factors Affecting ASR Model Self-Training
Add to your list(s)
Download to your calendar using vCal
If you have a question about this talk, please contact Dr Marcus Tomalin.
Low-resource ASR self-training seeks to minimize resource requirements
such as manual transcriptions or language modeling text. This is
accomplished by training on large quantities of audio automatically
labeled by a small initial model. By analyzing our previous experiments
with the conversational telephone English Fisher corpus, we demonstrate
where self-training succeeds and under what resource conditions it
provides the most benefit. Additionally, we will show success on Spanish
and Levantine conversational speech as well as the tougher English
Callhome set, despite initial WER of more than 60%. Finally, by digging
beneath average word error rate and analyzing individual word
performance, we show that self-trained models successfully learn new
words. More importantly, self-training benefits most words which appear
in the unlabeled audio but do not appear in the manual transcriptions.
This talk is part of the Machine Intelligence Laboratory Speech Seminars series.
This talk is included in these lists:
Note that ex-directory lists are not shown.
|