University of Cambridge > > CUED Speech Group Seminars > Modulation spectrum-based approach to high-quality statistical parametric speech synthesis

Modulation spectrum-based approach to high-quality statistical parametric speech synthesis

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Rogier van Dalen.

Sandwiches will be provided

This talk presents Modulation Spectrum (MS)-based approach to high-quality statistical parametric speech synthesis including text-to-speech synthesis and voice conversion. Many attempts, such as Hidden Markov Model (HMM)-based speech synthesis and Gaussian Mixture Model (GMM)-based voice conversion, are studied to produce various voices of the world. One of the critical problem of the statistical parametric speech synthesis is the excessive quality degradation of the synthetic speech. This is because the detailed characteristics of the speech parameters are overly smoothed by the statistical processing. This talk introduces the MS to alleviate the quality degradation. The MS has better capability to sensitively capture the over-smoothing effect than the conventional measures, such as mel-cepstral distortion and global variance. I integrate the MS into the training or synthesis phase of HMM -based speech synthesis and GMM -based voice conversion. The result of the perceptual test demonstrates the significant improvements in synthetic speech quality.

Shinnosuke Takamichi is a Ph.D student with Tomoki Toda at Nara Institute of Science and Technology (Japan). Additionally, he is a visiting researcher at Carnegie Mellon University (US).

Sandwiches will be provided.

This talk is part of the CUED Speech Group Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2023, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity