Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

SoundStream: An End-to-End Neural Audio Codec

Add to your list(s) Download to your calendar using vCal

Neil Zeghidour (Google)
Monday 04 April 2022, 12:00-13:00
Zoom: https://eng-cam.zoom.us/j/81927138251?pwd=TVd3MXliV003dUdYVlFwU2NDWGpmdz09.

If you have a question about this talk, please contact Dr Jie Pu.

This talk will be on zoom

Abstract: Audio codecs (mp3, Opus), are compression algorithms used whenever one needs to transmit audio, whether when streaming a song or during a conference call. In this talk, I will present SoundStream, a novel neural audio codec that can efficiently compress speech, music and general audio at bitrates normally targeted by speech-tailored codecs. SoundStream relies on a model architecture composed by a fully convolutional encoder/decoder network and a residual vector quantizer, which are trained jointly end-to-end. Training leverages recent advances in text-to-speech and speech enhancement, which combine adversarial and reconstruction losses to allow the generation of high-quality audio content from quantized embeddings. By training with structured dropout applied to quantizer layers, a single model can operate across variable bitrates from 3kbps to 18kbps, with a negligible quality loss when compared with models trained at fixed bitrates. In addition, the model is amenable to a low latency implementation, which supports streamable inference and runs in real time on a smartphone CPU . In subjective evaluations using audio at 24kHz sampling rate, SoundStream at 3kbps outperforms Opus at 12kbps and approaches EVS at 9.6kbps. Moreover, we are able to perform joint compression and enhancement either at the encoder or at the decoder side with no additional latency, which we demonstrate through background noise suppression for speech.

Bio: Neil Zeghidour is a Senior Research Scientist at Google Brain in Paris, and teaches automatic speech processing at Ecole Normale Supérieure. He previously graduated with a PhD in Machine Learning from Ecole Normale Superieure in Paris, jointly with Facebook AI Research. His main research interest is to integrate signal processing and deep learning into fully learnable architectures for audio understanding and generation.

This talk is part of the CUED Speech Group Seminars series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

SoundStream: An End-to-End Neural Audio Codec

This talk is included in these lists:

Other lists

Other talks