Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Vector Quantization in Deep Neural Networks for Speech and Image Processing

Add to your list(s) Download to your calendar using vCal

Mohammad Vali, Aalto University, Finland
Monday 11 November 2024, 12:00-13:00
Online only: Zoom: https://cam-ac-uk.zoom.us/j/89623597387?pwd=PalnRtu2be5cw3aGReM6EyvfcMrcly.1.

If you have a question about this talk, please contact Simon Webster McKnight.

Vector quantization (VQ) is a classic signal processing technique that models the probability density function of a distribution using a set of representative vectors called codebook (or dictionary). Deep neural networks (DNNs) are a branch of machine learning that has gained popularity in recent decades. Since VQ provides an abstract high-level discrete representation of a distribution, it has been widely used in various DNN -based applications such as speech recognition, image generation, and speech and video coding. Hence, a small improvement in VQ can significantly boost the performance of many applications dealing with different data types, such as speech, image, video, and text. This talk mainly focuses on improving various VQ methods within deep learning frameworks, including: 1) Improvement in training: VQ is non-differentiable, and thus, it cannot backpropagate gradients. We proposed a new solution to this issue that works better than state-of-the-art solutions, such as Straight-Through Estimator and Exponential Moving Average. 2) Improvement in Interpretability: With the combination of VQ and space-filling curves concepts, we proposed a new quantization technique called Space-Filling Vector Quantization. This technique helps to interpret the latent spaces of DNNs. 3) Improvement in Privacy: We used the Space-Filling Vector Quantization technique to cluster the speaker embeddings to enhance the speaker’s privacy in speech processing tools based on DNNs.

This talk is part of the CUED Speech Group Seminars series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Vector Quantization in Deep Neural Networks for Speech and Image Processing

This talk is included in these lists:

Other lists

Other talks