Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Time-domain multi-channel speech separation and extraction

Add to your list(s) Download to your calendar using vCal

Jisi Zhang, University of Sheffield
Tuesday 08 June 2021, 12:00-13:00
Zoom: https://zoom.us/j/95352633552?pwd=RzJVK2UzOGZyNU5mVHd1Y1VPT2tDUT09.

If you have a question about this talk, please contact Dr Kate Knill.

This talk will be on zoom.

Abstract: When multiple speakers talk at the same time, each utterance is partially or completely overlapped by one or more competing utterances. The overlapping speech is challenging to speech technologies, including automatic speech recognition, speaker diarization, and speaker verification. These challenges can be overcome by using speech separation front-ends, which aim to segregate individual source speakers from a mixture signal. Despite the recent progress of single-channel speech separation driven by advances in deep learning, it still performs poorly in distant microphone scenarios where noise and reverberation are involved. This talk focuses on the development of multi-channel speech separation techniques for separating mixture signals in the distant microphone case. The talk will be split into three parts. The first part introduces an end-to-end neural architecture with time-domain multi-microphone input. Second, the knowledge of speaker identity is exploited to extend the multi-channel separation system to perform a multi-speaker extraction task. Finally, an unsupervised approach is described, which aims to applying the end-to-end separation system in situations where supervised data is hard to collect. The methods are evaluated using simulated data with reverberation and ambient noise, and in terms of signal enhancement metrics and as front-ends to ASR .

Bio: Jisi Zhang is a final year PhD student supervised by Professor Jon Barker, in the Department of Computer Science at the University of Sheffield. He is interested in speech separation, multi-channel processing, and multi-talker speech recognition.

This talk is part of the CUED Speech Group Seminars series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Time-domain multi-channel speech separation and extraction

This talk is included in these lists:

Other lists

Other talks