Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Long-Range Transformers

Add to your list(s) Download to your calendar using vCal

Valerii Likhosherstov, University of Cambridge
Wednesday 03 March 2021, 11:00-12:30
https://eng-cam.zoom.us/j/86068703738?pwd=YnFleXFQOE1qR1h6Vmtwbno0LzFHdz09.

If you have a question about this talk, please contact Elre Oldewage.

These days, Transformer architectures are showing state-of-the-art performance in many tasks, including natural language processing, computer vision, protein modelling and beyond. Unfortunately, Transformers scale quadratically (O(L^2)) as the sequence length L grows. In this talk, we will discuss a zoo of recently proposed methods to reduce time or memory complexity of Transformers up to O(L) and even O(1).

Literature:

Efficient Transformers: A Survey. Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler. arXiv:2009.06732.

Rethinking Attention with Performers. Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, Adrian Weller. ICLR 2021 .

Sub-Linear Memory: How to Make Performers SLiM. Valerii Likhosherstov, Krzysztof Choromanski, Jared Davis, Xingyou Song, Adrian Weller. arXiv:2012.11346.

This talk is part of the Machine Learning Reading Group @ CUED series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Long-Range Transformers

This talk is included in these lists:

Other lists

Other talks