University of Cambridge > > CUED Speech Group Seminars > Paraphrastic Language Models / Structured SVMs for ASR

Paraphrastic Language Models / Structured SVMs for ASR

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Catherine Breslin.


In natural languages multiple word sequences can represent the same underlying meaning. Only modelling the observed surface word sequence can result in poor context coverage, for example, when using N-gram language models (LM). To handle this issue, this paper presents a novel form of language model, the paraphrastic LM. A phrase level paraphrase model that is statistically learned from standard text data is used to generate paraphrase variants. LM probabilities are then estimated by maximizing their marginal probability. Significant error rate reductions of 0.5%-0.6% absolute were obtained over the baseline N-gram LMs on two state-of-the-art recognition tasks for English conversational telephone speech and Mandarin Chinese broadcast speech using a paraphrastic multi-level LM modelling both word and phrase sequences. When it is further combined with word and phrase level neural network LMs, significant error rate reduction of 0.9% absolute (9% relative) and 0.5% absolute (5% relative) were obtained over the baseline N-gram and neural network LMs respectively.


Combining generative and discriminative models offers a flexible sequence classification framework. This talk describes a structured support vector machines (S-SVM) approach in this framework suitable for medium to large vocabulary speech recognition. An important aspect of S-SVMs is the form of the joint feature spaces. Here, generative models, hidden Markov models, are used to obtain the features. To apply this form of combined generative and discriminative model to speech recognition tasks, a number of issues need to be addressed. First, the features extracted are a function of the segmentation of the utterance. A Viterbi-like scheme for obtaining the “optimal” segmentation is described. Second, we will show that S-SVMs can be viewed as large margin trained log linear models. Finally to speed up the training process, a 1-slack algorithm, caching competing hypotheses and parallelization strategies will also be presented.

This talk is part of the CUED Speech Group Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2024, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity