University of Cambridge > > CUED Speech Group Seminars > Interpretable representation learning for speech and audio signals

Interpretable representation learning for speech and audio signals

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Dr Kate Knill.

Seminar on zoom

The learning of interpretable representations from raw data presents significant challenges for time series data like speech. In this talk, we will discuss a relevance weighting scheme that allows the interpretation of the speech representations during the forward propagation of the model itself.
  • The relevance weighting is achieved in a 2-stage deep representation learning framework where the weighting approach performs the task of feature selection at each stage.
  • A relevance sub-network, applied on the first stage operating on raw speech signals, acts as an acoustic filterbank layer with relevance weighting. A similar relevance sub-network applied on the second convolutional layer performs modulation filterbank learning with relevance weighting.
  • All the layers are trained jointly for a speech recognition task on noisy and reverberant speech. The proposed representation learning framework is also extended for the task of sound classification.

We will discuss the detailed analysis of the relevance weights and intermediate representations learned by the model which would reveal that the relevance weights capture information regarding the underlying speech/audio content, along with improved system performances.

Bio: Purvi Agrawal recently defended her Ph.D. thesis titled “Neural Representation learning for Speech and Audio Signals” from Learning and Extraction of Acoustic Patterns (LEAP) lab with Dr. Sriram Ganapathy, Dept. of Electrical Engineering, Indian Institute of Science (IISc), Bangalore. Prior to joining IISc, she obtained her Masters in Speech Communications from DA-IICT, Gandhinagar in 2015. She has also worked in Sony R & D Labs, Tokyo in 2017. She will be joining as an Applied Researcher-II at Microsoft India with the speech research team in Feb. 2021. Her research interests include interpretable deep learning, raw waveform modeling, low-resource data modeling, unsupervised/self-supervised learning.

This talk is part of the CUED Speech Group Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2024, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity