Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Generative Speech Separation based on Pitch Information

Add to your list(s) Download to your calendar using vCal

Dr Xiang Li, Cambridge University Engineering Department
Monday 11 October 2021, 12:00-13:00
Zoom: https://us06web.zoom.us/j/87426783837?pwd=akx4ZWZMYVZML2ZoOWRlYzdRaTd6dz09.

If you have a question about this talk, please contact Dr Jie Pu.

This talk will be on zoom

Abstract: Monaural speech separation aims to separate concurrent speakers from a single-microphone mixture recording. Inspired by auditory scene analysis mechanisms, a generative speech separation framework based on pitch information will be presented in this talk. The prominent advantage of this framework is that both the permutation problem and the unknown speaker number problem existing in general models can be solved by using pitch contours to indicate the target speaker to be separated. In addition, the generative approach is applied instead of traditional time-frequency mask based approach, to improve the perceptual quality of separated speech. Specifically, the proposed framework can be divided into two phases: pitch extraction and speech separation. The former aims to accurately extract pitch contour candidates for each speaker from the mixture, where a two-stage approach is presented. Any pitch contour can be selected as the condition at the second phase, and a conditional generative adversarial network (CGAN) is used to separate the speaker corresponding to the given pitch condition. The proposed framework is evaluated in terms of pitch extraction as well as speech separation.

Bio: Xiang Li is a Research Associate in the Speech Group of the Machine Intelligence Laboratory, Engineering Department of Cambridge University, worked with Prof. Mark Gales. She recently received her PhD from Peking University, supervised by Prof. Xihong Wu. This talk is about her PhD thesis. Her research interests include speech enhancement/separation, perception and natural language processing.

This talk is part of the CUED Speech Group Seminars series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Generative Speech Separation based on Pitch Information

This talk is included in these lists:

Other lists

Other talks