COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > CUED Speech Group Seminars > SummaryMixing: A Linear-Time Attention Alternative
SummaryMixing: A Linear-Time Attention AlternativeAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Simon Webster McKnight. Modern speech processing systems rely on self-attention. Unfortunately, self-attention takes quadratic time in the length of the speech utterance, causing inference and training on long sequences to be slower and consume more memory. Though cheaper alternatives to self-attention for speech recognition have been developed, they degrade performance. We propose a novel linear-time alternative to self-attention that, for the first time, does reach better accuracy. Our model, SummaryMixing, computes a mean over the whole utterance and feeds this summary back to each time step.Experiments are performed in three vital scenarios: an encoder-decoder offline model; an online streaming Transducer model; and a self-supervised model. In all three scenarios, SummaryMixing gives equal or better accuracy than self-attention, at lower cost. This talk is part of the CUED Speech Group Seminars series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsChurchill Scholars Overly Awesome Research Symposium (ChuSOARS) Faculty of Music Colloquia Set Theory SeminarOther talksTitle TBC Verifying Peephole Rewriting In SSA Compiler IRs Optimizing the diffusion for sampling with overdamped Langevin dynamics Branching Brownian motion, branching random walks, and the Fisher-KPP equation in spatially random environment Arm relaxed systems semantics Sociology Lunchtime Seminar: The Problem of White Normativity: How Systemic Racism Works and the Curious Case of Historically White Colleges and Universities |