University of Cambridge > > CUED Speech Group Seminars > Syllable based keyword search: transducing syllable lattices to word lattices

Syllable based keyword search: transducing syllable lattices to word lattices

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Rogier van Dalen.

Sandwiches will be provided at 13:00

This paper presents a weighted finite state transducer (WFST) based syllable decoding and transduction framework for keyword search (KWS). Acoustic context dependent phone models are trained from word forced alignments. Then syllable decoding is done with lattices generated using a syllable lexicon and language model (LM). To process out of vocabulary (OOV) keywords, pronunciations are produced using a grapheme-to-syllable (G2S) system. Syllables not seen in the training set are approximated by using the closest perceptual syllable in the recognized syllable set. A syllable to word lexical transducer containing both in-vocabulary (IV) and OOV keywords is then constructed and composed with a keyword-boosted LM transducer. The composed transducer is then used to transduce syllable lattices to word lattices for final KWS . An ngram word sequence LM with the keywords boosted, provides the best performance. We show that our method can effectively perform KWS on both IV and OOV keywords, and yields up to 0.03 Actual Term-Weighted Value (ATWV) improvement over searching keywords directly in syllable lattices. Word Error Rates (WER) and KWS results are reported for five different languages, comparing whole word, phonetic confusion and syllable techniques. Combining the techniques provides even more improvement.


Jim Hieronymus is a senior scientist and principal investigator at the International Institute for Computer Science in Berkeley, CA, USA . He is a collaborator with the Cambridge Speech Recognition Group in the Engineering Department. He has worked on putting a spoken dialog system on the International Space Station for NASA , on the EU Trindi project on integrating prosodics into a dialogue system, and at Bell Labs on spoken dialogue systems, speech recognition and spoken language identification. Before that Jim was a professor at the Center for Speech Technology Research and the Linguistics Department at Edinburgh University.

Sandwiches will be provided at 13:00, 30 minutes before the talk.

This talk is part of the CUED Speech Group Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2024, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity