University of Cambridge > > CUED Speech Group Seminars > Dynamic Topic Adaptation for Statistical Machine Translation

Dynamic Topic Adaptation for Statistical Machine Translation

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Rogier van Dalen.

Sandwich lunch from 13:00

In recent years there has been an increased interest in domain adaptation techniques for statistical machine translation (SMT) to deal with the growing amount of data from different sources. Topic modelling techniques applied to SMT are closely related to the field of domain adaptation but more flexible in modelling structure in between and beyond corpus boundaries, which are often arbitrary. In this talk, the main focus is on dynamic translation model adaptation to texts of unknown origin, which is a typical scenario for an online MT engine translating web documents. We introduce a new bilingual topic model for SMT that takes the entire document context into account and directly estimates topic-dependent phrase translation probabilities. We demonstrate the model’s ability to improve over several domain adaptation baselines and provide evidence for the advantages of bilingual topic modelling for SMT over the more common monolingual topic modelling. We introduce another topic model for SMT which exploits the distributional nature of phrase pair meaning by modelling topic distributions over phrase pairs using their distributional profiles. Using this model, we explore combinations of local and global contextual information and demonstrate the usefulness of different levels of contextual information. We investigate the relationship between domain adaptation and topic adaptation by combining both methods with automatic prediction of domain labels at the document level.

This talk is part of the CUED Speech Group Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2024, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity