University of Cambridge > Talks.cam > Computational Neuroscience > State space models (as alternatives to Transformers)

State space models (as alternatives to Transformers)

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact .

The transformer architecture underlies recent breakthroughs in AI. Famously, LLM ’s such as the GPT series use this architecture, but transformers have now taken over as the standard used across domains. Despite their impressive track record, however, transformers also have undesirable attributes. While performance in most tasks increases with the length of the context window, the memory requirement and computational time cost of transformers scale adversely with the context window length. Beyond the issue of computational cost, there are also reasons to think transformers may not have suboptimal inductive biases for certain sequential tasks. These considerations have motivated the search for alternative architectures. The most promising among these are so-called State Space Models (SSM), which are recurrent architectures (which also brings them closer to biological plausibility). We will start with a review of tansformers and motivate their connection with SSM , by reviewing the connection between convolutional vs state-space formulation of linear time-invariant systems. We then cover the below papers on two specific SSM architectures and (time allowing) a theoretical study of learning dynamics in linear SSM . 1. https://arxiv.org/abs/2312.00752 (Mamba: the currently popular SSM architecture) 2. https://arxiv.org/pdf/2410.01201 (miniGRU a minimal version of GRU — a recently proposed bare-bones post-transformer architecture) 3. https://arxiv.org/pdf/2407.19115 (A theoretical study of gradient based learning in linear SSM )

This talk is part of the Computational Neuroscience series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity