Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Transformer: the 3rd generation neural network acoustic models for ASR and its application at Facebook

Add to your list(s) Download to your calendar using vCal

Yongqiang Wang, Facebook
Monday 30 November 2020, 15:00-16:00
Zoom: https://zoom.us/j/94591123432?pwd=bUJObFZ3UnFYLy9pWENDcS9aYUZqUT09.

If you have a question about this talk, please contact Dr Kate Knill.

Please note change of time from original post to 3-4pm

Since the introduction of deep learning to automatic speech recognition (ASR), neural network architectures have evolved rapidly from feed-forward networks to recurrent networks. Recently, in natural language processing are, transformer network-based sequence modeling has demonstrated strong results over recurrent network-based one, in terms of both modeling accuracy and inference speed. However, it is non-trivial to adopt the transformer architecture in speech recognition due to the unique requirement in ASR like streaming processing. In this talk, we showed that how the transformer architecture can be modified to fit different latency requirements for a range of speech applications. Specifically, we augmented the attention module in transformer with a set of memory slots, results in an efficient memory transformer, Emformer. We compare our Emformer with LSTM -based acoustic model under both low latency and medium latency scenarios, on the widely used librispeech benchmark and a series of industrial scale tasks, whose training data ranges from 9K hours to 2.2M hours. We showed that on the medium latency tasks, Emformer provides 10-20% error reduction and 2-3x inference speed up; on the low latency task, Emformer achieved similar word rate reduction at a cost of slightly increased real time factors (RTF). By presenting these results, we hope that we can convince the audience that transformer could become the third generation of neural acoustic model for both traditional hybrid and end-to-end ASR systems.

This talk is part of the CUED Speech Group Seminars series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Transformer: the 3rd generation neural network acoustic models for ASR and its application at Facebook

This talk is included in these lists:

Other lists

Other talks