University of Cambridge > Talks.cam > CUED Speech Group Seminars > Vector Quantization in Deep Neural Networks for Speech and Image Processing

Vector Quantization in Deep Neural Networks for Speech and Image Processing

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Simon Webster McKnight.

Vector quantization (VQ) is a classic signal processing technique that models the probability density function of a distribution using a set of representative vectors called codebook (or dictionary). Deep neural networks (DNNs) are a branch of machine learning that has gained popularity in recent decades. Since VQ provides an abstract high-level discrete representation of a distribution, it has been widely used in various DNN -based applications such as speech recognition, image generation, and speech and video coding. Hence, a small improvement in VQ can significantly boost the performance of many applications dealing with different data types, such as speech, image, video, and text. This talk mainly focuses on improving various VQ methods within deep learning frameworks, including: 1) Improvement in training: VQ is non-differentiable, and thus, it cannot backpropagate gradients. We proposed a new solution to this issue that works better than state-of-the-art solutions, such as Straight-Through Estimator and Exponential Moving Average. 2) Improvement in Interpretability: With the combination of VQ and space-filling curves concepts, we proposed a new quantization technique called Space-Filling Vector Quantization. This technique helps to interpret the latent spaces of DNNs. 3) Improvement in Privacy: We used the Space-Filling Vector Quantization technique to cluster the speaker embeddings to enhance the speaker’s privacy in speech processing tools based on DNNs.

This talk is part of the CUED Speech Group Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity