University of Cambridge > Talks.cam > Speech Seminars > Applications of Lexicographic Semirings in Speech and Language Processing

Applications of Lexicographic Semirings in Speech and Language Processing

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Bill Byrne.

Sandwiches will be provided.

In this talk, I’ll present a couple of applications of lexicographic semirings for encoding sequence models, which yield useful algorithms based on weighted finite-state determinization. Lexicographic semirings involve an ordered set of dimensions, each of which is itself a semiring. First, I’ll briefly introduce weighted finite-state automata and transducers, semirings, and lexicographic semirings, followed by a presentation of two special cases. The first lexicographic semiring we examine involves a pair of tropical semirings, which provides an exact automata encoding of smoothed n-gram models using simple epsilon transitions rather than failure transitions. This allows for off-line optimization of exact models represented as large weighted finite-state transducers in contrast to implicit (on-line) failure transition representations. The second lexicographic semiring is a pair of a tropical semiring and a new string semiring which we call a “categorial semiring”. The categorial semiring is inspired by categorial grammar and includes an operation of string division. This semiring allows us to use weighted finite-state determinization on a weighted transducer so that every input sequence has exactly one (minimum cost) output sequence. For example, a part-of-speech tagged word lattice can be determinized so that every word string in the original lattice has just one path in the tagged lattice, corresponding to the Viterbi-best POS -tag sequence for that word string. Tools based on both of these methods will be available as part of the new ngram library available from OpenGrm.org. (Joint work with Richard Sproat, Izhak Shafran and Mahsa Yarmohammadi)

Brian Roark is an Associate Professor in the Center for Spoken Language Understanding (CSLU) and Dept. of Biomedical Engineering at Oregon Health & Science University (OHSU). He received his PhD from Brown University in 2001 and spent 3 years in the Speech Algorithms Department at AT&T Labs – Research before joining CSLU . His research interests include natural language processing, language modeling for various applications, assistive technology, and spoken language understanding.

This talk is part of the Speech Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2017 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity