University of Cambridge > Talks.cam > CUED Speech Group Seminars > SummaryMixing: A Linear-Time Attention Alternative

SummaryMixing: A Linear-Time Attention Alternative

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Simon Webster McKnight.

Modern speech processing systems rely on self-attention. Unfortunately, self-attention takes quadratic time in the length of the speech utterance, causing inference and training on long sequences to be slower and consume more memory. Though cheaper alternatives to self-attention for speech recognition have been developed, they degrade performance. We propose a novel linear-time alternative to self-attention that, for the first time, does reach better accuracy. Our model, SummaryMixing, computes a mean over the whole utterance and feeds this summary back to each time step.Experiments are performed in three vital scenarios: an encoder-decoder offline model; an online streaming Transducer model; and a self-supervised model. In all three scenarios, SummaryMixing gives equal or better accuracy than self-attention, at lower cost.

This talk is part of the CUED Speech Group Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity