University of Cambridge > Talks.cam > Machine Intelligence Laboratory Speech Seminars > Use of Linguistic Information and Reordering Strategies for Ngram- based Statistical Machine Translation

Use of Linguistic Information and Reordering Strategies for Ngram- based Statistical Machine Translation

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Dr Marcus Tomalin.

This seminar will be devoted to an overview of the experience in statistical machine translation at UPC during recent years. Firstly, the Ngram-based SMT system will be described, detailing bilingual unit definition and basic feature functions for a monotone language pair. Secondly, the introduction of linguistic information at various stages will be discussed, including word alignment (investigating correlation between Alignment Error Rate and translation scores), bilingual unit segmentation and direct translation modelling. Results on English-to-Spanish verb form classification will be reviewed, as well as the impact of morphology reduction on bilingual N-gram formulation. For language pairs exhibiting less monotone word order, the reordering strategies implemented will be presented. Particularly, reordered search involving tuple unfolding and extended monotone search by linguistically-driven reordering rules will be compared for Arabic, Chinese and Spanish-to-English tasks. Finally, the seminar will conclude outlining general future research directions towards improving performance of current state-of-the-art SMT systems.

This talk is part of the Machine Intelligence Laboratory Speech Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2019 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity