Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Applications of Lexicographic Semirings in Speech and Language Processing

Add to your list(s) Download to your calendar using vCal

Brian Roark, Oregon Health & Science University (OHSU)
Monday 03 October 2011, 13:00-14:30
Cambridge University Engineering Department, Room LR10.

If you have a question about this talk, please contact Bill Byrne.

Sandwiches will be provided.

In this talk, I’ll present a couple of applications of lexicographic semirings for encoding sequence models, which yield useful algorithms based on weighted finite-state determinization. Lexicographic semirings involve an ordered set of dimensions, each of which is itself a semiring. First, I’ll briefly introduce weighted finite-state automata and transducers, semirings, and lexicographic semirings, followed by a presentation of two special cases. The first lexicographic semiring we examine involves a pair of tropical semirings, which provides an exact automata encoding of smoothed n-gram models using simple epsilon transitions rather than failure transitions. This allows for off-line optimization of exact models represented as large weighted finite-state transducers in contrast to implicit (on-line) failure transition representations. The second lexicographic semiring is a pair of a tropical semiring and a new string semiring which we call a “categorial semiring”. The categorial semiring is inspired by categorial grammar and includes an operation of string division. This semiring allows us to use weighted finite-state determinization on a weighted transducer so that every input sequence has exactly one (minimum cost) output sequence. For example, a part-of-speech tagged word lattice can be determinized so that every word string in the original lattice has just one path in the tagged lattice, corresponding to the Viterbi-best POS -tag sequence for that word string. Tools based on both of these methods will be available as part of the new ngram library available from OpenGrm.org. (Joint work with Richard Sproat, Izhak Shafran and Mahsa Yarmohammadi)

Brian Roark is an Associate Professor in the Center for Spoken Language Understanding (CSLU) and Dept. of Biomedical Engineering at Oregon Health & Science University (OHSU). He received his PhD from Brown University in 2001 and spent 3 years in the Speech Algorithms Department at AT&T Labs – Research before joining CSLU . His research interests include natural language processing, language modeling for various applications, assistive technology, and spoken language understanding.

This talk is part of the Speech Seminars series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Applications of Lexicographic Semirings in Speech and Language Processing

This talk is included in these lists:

Other lists

Other talks