University of Cambridge > > Language Technology Lab Seminars > Towards Perfect Supervised and Unsupervised Machine Translation

Towards Perfect Supervised and Unsupervised Machine Translation

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Marinela Parovic.

Data-driven Machine Translation is an interesting application of machine-learning-based natural language processing techniques to multilingual data. Particularly with the recent advent of powerful neural network models, it has become possible to incorporate many types of information directly into the model and to robustly model long-distance dependencies in the sequence of words being generated.

I will discuss four areas of work addressing important weaknesses of data-driven machine translation approaches. First, I will present an alternative model to phrase-based statistical machine translation, which jointly models translation operations and reordering operations and was widely adopted by researchers and end-users. Second, I will discuss the important problem of data sparsity in translation which is caused by rich morphology, and discuss extensive work we have carried out to overcome this. Third, I will discuss progress towards breaking the strong domain dependency between the data used to train supervised neural machine translation systems and the data that will be translated. Finally, I will briefly present a new research program which will allow us to build strong unsupervised machine translation systems, enabling the carrying out of high quality translation between pairs of languages for which no known source of parallel training data exists.

This talk is part of the Language Technology Lab Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2023, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity