COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > CUED Speech Group Seminars > The applications of discrete speech tokens for robust and context-aware text-to-speech synthesis
The applications of discrete speech tokens for robust and context-aware text-to-speech synthesisAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Simon Webster McKnight. In a conventional neural text-to-speech (TTS) pipeline, there are typically two stages: firstly, the prediction of a mel-spectrogram from text through an acoustic model, followed by the generation of waveform data from the mel-spectrogram with a vocoder. However, such systems often suffer from suboptimal quality and sensitivity to the quality of the training data. We propose for the first time to leverage discrete speech tokens from self-supervised models as the intermediate feature of TTS pipeline, leading to a significant improvement in the robustness. Building upon this novel pipeline, we extend its applications to context-aware TTS tasks, where speech coherence with the context is taken into account during the speech generation process. This talk is part of the CUED Speech Group Seminars series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsType the title of a new list here home Food 4 Thought WebinarsOther talksPantastic archaeology in the northern Namib Sand Sea 20th Armitage Workshop and Lecture Novel Methods of Engineering and Characterizing Carbon Materials for Sustainable Applications Lunch at Moller Institute Stephane Mazevet on Methanogenesis in Enceladus The (in)fidelity of human mitochondrial gene expression |