COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > CUED Speech Group Seminars > Transformer: the 3rd generation neural network acoustic models for ASR and its application at Facebook
Transformer: the 3rd generation neural network acoustic models for ASR and its application at FacebookAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Dr Kate Knill. Please note change of time from original post to 3-4pm Since the introduction of deep learning to automatic speech recognition (ASR), neural network architectures have evolved rapidly from feed-forward networks to recurrent networks. Recently, in natural language processing are, transformer network-based sequence modeling has demonstrated strong results over recurrent network-based one, in terms of both modeling accuracy and inference speed. However, it is non-trivial to adopt the transformer architecture in speech recognition due to the unique requirement in ASR like streaming processing. In this talk, we showed that how the transformer architecture can be modified to fit different latency requirements for a range of speech applications. Specifically, we augmented the attention module in transformer with a set of memory slots, results in an efficient memory transformer, Emformer. We compare our Emformer with LSTM -based acoustic model under both low latency and medium latency scenarios, on the widely used librispeech benchmark and a series of industrial scale tasks, whose training data ranges from 9K hours to 2.2M hours. We showed that on the medium latency tasks, Emformer provides 10-20% error reduction and 2-3x inference speed up; on the low latency task, Emformer achieved similar word rate reduction at a cost of slightly increased real time factors (RTF). By presenting these results, we hope that we can convince the audience that transformer could become the third generation of neural acoustic model for both traditional hybrid and end-to-end ASR systems. This talk is part of the CUED Speech Group Seminars series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsClare Hall Talks Science non-Fiction & the Bottom Billion: Evolving Frameworks for a fairer Future Cambridge IR and History Seminar SeriesOther talksDetection and Characterization of Blood-borne Prions AstraZeneca Oncology Bioinformatics case studies and the skill sets we value Kant's 'True Politics' Transitional Bleeding in Early Modern England Atoms for Peace (and War): the Italian Nuclear Programme During the Cold War |