COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > CUED Speech Group Seminars > Prosody transfer evaluation and temporal prosody control in speech synthesis
Prosody transfer evaluation and temporal prosody control in speech synthesisAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Dr Kate Knill. This seminar will take place on zoom Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis Abstract: We propose a model that generates speech explicitly conditioned on the three primary acoustic correlates of prosody: F0, energy and duration. The model is flexible about how the values of these features are specified: they can be externally provided, or predicted from text, or predicted then subsequently modified. Compared to a model that employs a variational auto-encoder to learn unsupervised latent features, our model provides more interpretable, temporally-precise, and disentangled control. ADEPT: A Dataset for Evaluating Prosody Transfer Abstract: We introduce an English corpus of prosodically-varied reference natural speech samples for evaluating prosody transfer. The samples include global and local variations across utterances. The corpus only includes prosodic variations that listeners are able to distinguish with reasonable accuracy, and we report these figures as a benchmark against which text-to-speech prosody transfer can be compared. We also propose a subjective prosody transfer evaluation methodology. Speaker bios: Tian Huey Teh is a machine learning engineer at Papercup, based in London. She completed the MSc Computational Statistics and Machine Learning programme at University College London in 2018. Since graduating she has been working on TTS research and development, focusing on prosody modelling and scaling systems across languages. Alexandra Torresquintero is a Data Engineer on the machine learning team at Papercup. She completed her MSc in Speech and Language processing at the University of Edinburgh in 2019. Whilst at Papercup, she has worked on formalising the processing behind the TTS training data, including Linguistic Frontend optimisations, research into g2p modelling, and building a database to store our data. This talk is part of the CUED Speech Group Seminars series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsArrol Adam Lectures - 'Responses to the First World War' Photonics Summer Seminars 2017 Quantum Tricritical Points in NbFe2Other talksMore the bystanders: Learning how microglia eliminate synapses in the Alzheimer brain Protein complexes subjected to tandem mass spectrometry reveal allosteric binding partners Cambridge - Nova Workshop - Day 2 Cambridge - Nova Workshop - Day 1 Collective Ecophysiology & Physics of Bee Swarms |