University of Cambridge > Talks.cam > CUED Speech Group Seminars > Transformer: the 3rd generation neural network acoustic models for ASR and its application at Facebook

Transformer: the 3rd generation neural network acoustic models for ASR and its application at Facebook

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Kate Knill.

Please note change of time from original post to 3-4pm

Since the introduction of deep learning to automatic speech recognition (ASR), neural network architectures have evolved rapidly from feed-forward networks to recurrent networks. Recently, in natural language processing are, transformer network-based sequence modeling has demonstrated strong results over recurrent network-based one, in terms of both modeling accuracy and inference speed. However, it is non-trivial to adopt the transformer architecture in speech recognition due to the unique requirement in ASR like streaming processing. In this talk, we showed that how the transformer architecture can be modified to fit different latency requirements for a range of speech applications. Specifically, we augmented the attention module in transformer with a set of memory slots, results in an efficient memory transformer, Emformer. We compare our Emformer with LSTM -based acoustic model under both low latency and medium latency scenarios, on the widely used librispeech benchmark and a series of industrial scale tasks, whose training data ranges from 9K hours to 2.2M hours. We showed that on the medium latency tasks, Emformer provides 10-20% error reduction and 2-3x inference speed up; on the low latency task, Emformer achieved similar word rate reduction at a cost of slightly increased real time factors (RTF). By presenting these results, we hope that we can convince the audience that transformer could become the third generation of neural acoustic model for both traditional hybrid and end-to-end ASR systems.

This talk is part of the CUED Speech Group Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2021 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity