Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Prosody transfer evaluation and temporal prosody control in speech synthesis

Add to your list(s) Download to your calendar using vCal

Papercup
Tuesday 06 July 2021, 12:00-13:00
Zoom: https://zoom.us/j/95352633552?pwd=RzJVK2UzOGZyNU5mVHd1Y1VPT2tDUT09.

If you have a question about this talk, please contact Dr Kate Knill.

This seminar will take place on zoom

Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis

Abstract: We propose a model that generates speech explicitly conditioned on the three primary acoustic correlates of prosody: F0, energy and duration. The model is flexible about how the values of these features are specified: they can be externally provided, or predicted from text, or predicted then subsequently modified. Compared to a model that employs a variational auto-encoder to learn unsupervised latent features, our model provides more interpretable, temporally-precise, and disentangled control.

ADEPT: A Dataset for Evaluating Prosody Transfer

Abstract: We introduce an English corpus of prosodically-varied reference natural speech samples for evaluating prosody transfer. The samples include global and local variations across utterances. The corpus only includes prosodic variations that listeners are able to distinguish with reasonable accuracy, and we report these figures as a benchmark against which text-to-speech prosody transfer can be compared. We also propose a subjective prosody transfer evaluation methodology.

Speaker bios:

Tian Huey Teh is a machine learning engineer at Papercup, based in London. She completed the MSc Computational Statistics and Machine Learning programme at University College London in 2018. Since graduating she has been working on TTS research and development, focusing on prosody modelling and scaling systems across languages.

Alexandra Torresquintero is a Data Engineer on the machine learning team at Papercup. She completed her MSc in Speech and Language processing at the University of Edinburgh in 2019. Whilst at Papercup, she has worked on formalising the processing behind the TTS training data, including Linguistic Frontend optimisations, research into g2p modelling, and building a database to store our data.

This talk is part of the CUED Speech Group Seminars series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Prosody transfer evaluation and temporal prosody control in speech synthesis

This talk is included in these lists:

Other lists

Other talks