COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > CUED Speech Group Seminars > Generative Speech Separation based on Pitch Information
Generative Speech Separation based on Pitch InformationAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Dr Jie Pu. This talk will be on zoom Abstract: Monaural speech separation aims to separate concurrent speakers from a single-microphone mixture recording. Inspired by auditory scene analysis mechanisms, a generative speech separation framework based on pitch information will be presented in this talk. The prominent advantage of this framework is that both the permutation problem and the unknown speaker number problem existing in general models can be solved by using pitch contours to indicate the target speaker to be separated. In addition, the generative approach is applied instead of traditional time-frequency mask based approach, to improve the perceptual quality of separated speech. Specifically, the proposed framework can be divided into two phases: pitch extraction and speech separation. The former aims to accurately extract pitch contour candidates for each speaker from the mixture, where a two-stage approach is presented. Any pitch contour can be selected as the condition at the second phase, and a conditional generative adversarial network (CGAN) is used to separate the speaker corresponding to the given pitch condition. The proposed framework is evaluated in terms of pitch extraction as well as speech separation. Bio: Xiang Li is a Research Associate in the Speech Group of the Machine Intelligence Laboratory, Engineering Department of Cambridge University, worked with Prof. Mark Gales. She recently received her PhD from Peking University, supervised by Prof. Xihong Wu. This talk is about her PhD thesis. Her research interests include speech enhancement/separation, perception and natural language processing. This talk is part of the CUED Speech Group Seminars series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsNewnham College Speaker Series 192.168.1.2 IP, Admin Login, Username, Password Audio and Music Processing (AMP) Reading GroupOther talksWeek 9 Demystifying Deep Learning Catalyst for Black History Month: Inspiring you to change how you think about the future Technoscience in the tropics: public agricultural research and environmental imaginaries in Brazil |