University of Cambridge > > CUED Speech Group Seminars > Prosody transfer evaluation and temporal prosody control in speech synthesis

Prosody transfer evaluation and temporal prosody control in speech synthesis

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Dr Kate Knill.

This seminar will take place on zoom

Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis

Abstract: We propose a model that generates speech explicitly conditioned on the three primary acoustic correlates of prosody: F0, energy and duration. The model is flexible about how the values of these features are specified: they can be externally provided, or predicted from text, or predicted then subsequently modified. Compared to a model that employs a variational auto-encoder to learn unsupervised latent features, our model provides more interpretable, temporally-precise, and disentangled control.

ADEPT: A Dataset for Evaluating Prosody Transfer

Abstract: We introduce an English corpus of prosodically-varied reference natural speech samples for evaluating prosody transfer. The samples include global and local variations across utterances. The corpus only includes prosodic variations that listeners are able to distinguish with reasonable accuracy, and we report these figures as a benchmark against which text-to-speech prosody transfer can be compared. We also propose a subjective prosody transfer evaluation methodology.

Speaker bios:

Tian Huey Teh is a machine learning engineer at Papercup, based in London. She completed the MSc Computational Statistics and Machine Learning programme at University College London in 2018. Since graduating she has been working on TTS research and development, focusing on prosody modelling and scaling systems across languages.

Alexandra Torresquintero is a Data Engineer on the machine learning team at Papercup. She completed her MSc in Speech and Language processing at the University of Edinburgh in 2019. Whilst at Papercup, she has worked on formalising the processing behind the TTS training data, including Linguistic Frontend optimisations, research into g2p modelling, and building a database to store our data.

This talk is part of the CUED Speech Group Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2024, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity