University of Cambridge > > speech synthesis seminar series > Statistical Parametric Speech Synthesis Based on Speaker and Language Factorization

Statistical Parametric Speech Synthesis Based on Speaker and Language Factorization

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Kai Yu.

An increasingly common scenario in building hidden Markov model-based speech synthesis and recognition systems is training on inhomogeneous data. For example, data from multiple different sources and/or different types of data are used. This seminar introduces a new technique for training hidden Markov models on such inhomogeneous speech data, in this case including speaker and language variations. The proposed technique, speaker and language factorization, attempts to factorize speaker-specific/language-specific characteristics in the data and model them by individual transforms. Language-specific factors in the data are represented by transforms based on cluster mean interpolation with cluster-dependent decision trees. Acoustic variations caused by speaker characteristics are handled by transforms based on constrained maximum likelihood linear regression. This technique allows multi-speaker/multi-language adaptive training to be performed. Since each factor is represented by an individual transform, it is possible to factor-in only one of them. Experimental results on statistical parametric speech synthesis show that the proposed technique enables the speaker and language to be factorized, allowing the speaker transform estimated in one language to be successfully used to synthesize speech in different language while keeping the voice characteristics.

This talk is part of the speech synthesis seminar series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2023, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity