Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Adaptive Tokenization and Memory in Foundation Models

Add to your list(s) Download to your calendar using vCal

Edoardo Maria Ponti (University of Edinburgh)
Friday 01 November 2024, 12:00-13:00
Zoom link: https://cam-ac-uk.zoom.us/j/4751389294?pwd=Z2ZOSDk0eG1wZldVWG1GVVhrTzFIZz09.

If you have a question about this talk, please contact Suchir Salhan.

Abstract: State-of-the-art foundation models (FMs) process information as a sequence of internal representations; however, the length of this sequence is fixed and entirely determined by tokenization. This essentially decouples representation granularity from information content, which exacerbates the deployment costs of FMs and narrows their “horizons” in long sequences. What if, instead, we could dynamically adapt tokenization and memory in FMs to save computation while maintaining or even enhancing performance?

First, I will show how we can dynamically compress the key-value cache of Transformers by deciding when to append or merge items to memory. This offers a compromise between Transformers, whose linear key-value cache growth exhausts memory space and increases latency, and State Space Models, whose finite capacity may result in forgetfulness. Secondly, I will demonstrate how FMs can be “freed” from the tokenizers they are bound to by swapping them on-the-fly with arbitrary ones. Taking a step further, we can even get rid of tokenizers entirely by learning end-to-end how to jointly segment and model language.

Crucially, this new family of FM architectures equipped with adaptive memory and tokenization does not require to be trained from scratch; instead, pre-existing open-weight FMs can be retrofitted with a negligible amount of data for this purpose.

Bio: Edoardo M. Ponti is a Lecturer (≈ Assistant Professor) in Natural Language Processing at the University of Edinburgh, an Affiliated Lecturer at the University of Cambridge, and a visiting professor at NVIDIA . Previously, he was a visiting postdoctoral scholar at Stanford University and a postdoctoral fellow at Mila and McGill University in Montreal. In 2021, he obtained a PhD in computational linguistics from the University of Cambridge, St John’s College. His main research foci are efficient memory and tokenization, modular deep learning, and computational typology. His research earned him a Google Research Faculty Award and 2 Best Paper Awards at EMNLP 2021 and RepL4NLP 2019. He is a board member and co-founder of SIGTYP , the ACL special interest group for computational typology, and a scholar of the European Lab for Learning and Intelligent Systems (ELLIS). He is a (terrible) violinist, football player, and an aspiring practitioner of heroic viticulture.

This talk is part of the NLIP Seminar Series series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Adaptive Tokenization and Memory in Foundation Models

This talk is included in these lists:

Other lists

Other talks