University of Cambridge > Talks.cam > CUED Speech Group Seminars > Speaker Retrieval in the Wild: Challenges, Effectiveness and Robustness

Speaker Retrieval in the Wild: Challenges, Effectiveness and Robustness

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Simon Webster McKnight.

Effective speaker retrieval in real-world applications is an important problem with extensive applications, given the vastness of available media archives. In this talk, we investigate the speaker retrieval systems developed by CUED in the context of the EPSRC -funded MVSE (Multimodal Video Search by Example) project. While we focus on the BBC Rewind corpus (1948-1979), our framework addresses the broader issue of speaker retrieval on extensive and possibly aged archives.

We explore various challenges encountered in developing a speaker retrieval system in the wild, addressing two primary issues: the dataset’s unsuitability for direct training and performance evaluation due to noisy and unreliable metadata, and the unconstrained acoustic conditions encountered in the archive, ranging from quiet studios to adverse noisy real-world environments.

Various aspects of system development, challenges, potential solutions, and their functionality are examined, along with systematic experiments conducted in both clean setups and against various distortions to evaluate performance. Additionally, we touch on the utility of multimodal audio-visual speaker retrieval and analyse the synergy and consistency between these two modalities.

This talk is part of the CUED Speech Group Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity