University of Cambridge > Talks.cam > CUED Speech Group Seminars > “End-to-end multi-speaker neural TTS with LLM-based prosody prediction”

“End-to-end multi-speaker neural TTS with LLM-based prosody prediction”

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Simon Webster McKnight.

In recent years, Neural Text-to-Speech (NTTS) has revolutionised the TTS field and resulted in more natural, more expressive speech. In Amazon with products like Alexa and AWS Polly services, we are bringing generated speech of tens of voices and languages to millions of people. In Amazon TTS Research we are tackling a variety of research problems, from generative TTS , prosody transfer, neural front-end to machine dubbing and on-device TTS . In this presentation I will focus on part of the research of my team as published in Interspeech and SSW 2023 . First, I will give you a summary of the Amazon TTS papers presented in these two conferences in 2023. I will then present our work on eCat, a novel end-to-end multi-speaker model capable of: a) generating long-context speech with expressive and contextually appropriate prosody, and b) performing fine-grained prosody transfer between any pair of seen speakers. eCat improves TTS performance over our previous internal baselines, and when compared to VITS , a state-of-the-art TTS model, it is statistically significantly preferred. I will continue will a comparative study of fifteen pretrained language models for two TTS tasks: prosody prediction and pause prediction. Our findings revealed a logarithmic relationship between model size and quality, as well as significant performance differences between neutral and expressive prosody.

This talk is part of the CUED Speech Group Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity