University of Cambridge > > CUED Speech Group Seminars > Parameter-Efficient Fine-tuning for Audio and Speech Processing

Parameter-Efficient Fine-tuning for Audio and Speech Processing

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Simon Webster McKnight.

Leveraging large pre-trained models for downstream tasks has become a cornerstone of several domains like natural language processing and audio/speech processing. The typical paradigm involves adapting the whole model to each downstream task (i.e., full fine-tuning). However, given the relentless and unprecedented rise in scale of these foundation models, full fine-tuning becomes often prohibitive, especially when we deal with numerous downstream tasks. For this reason, parameter-efficient fine-tuning (PEFT) strategies have emerged, whereby only a small fraction of parameters are learned while keeping the backbone model frozen. In the realm of audio and speech processing, PEFT has also gained traction and become a valid alternative to full fine-tuning. In this talk, we first provide an overview of the most common PEFT methods for the efficient adaptation of the Audio Spectrogram Transformer to several tasks and under different scenarios. We then investigate how the paradigm of Mixture of Experts can be harnessed to scale the number of adapters, leading to enhanced performance. We conclude by proposing new adapter designs that turn out to beat full fine-tuning while adapting only 0.3% of parameters compared to it.

This talk is part of the CUED Speech Group Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2024, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity