University of Cambridge > Talks.cam > Language Technology Lab Seminars > Achieving Universality in Machine Translation: M4 - Massively Multilingual, Massive MT Models for the Next 1000 Languages

Achieving Universality in Machine Translation: M4 - Massively Multilingual, Massive MT Models for the Next 1000 Languages

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Marinela Parovic.

What does universality mean for machine translation? Massively multilingual models jointly trained on hundreds of languages, have been showing great success in processing different languages simultaneously in a single large model. These large multilingual models, which we call M4, are appealing for both efficiency and positive cross-lingual transfer: (1) Training and deploying a single multilingual model requires much less resources than maintaining one model for each language considered, (2) by transferring knowledge from high-resource languages, multilingual models are able to improve performance on low-resource languages. In this talk, we will be talking about our efforts on scaling machine translation models to more than 1000 languages. We will be detailing several research (and even some development) challenges that the project has tackled; multi-task learning with hundreds of tasks, learning under heavy data imbalance, understanding the learned representations, evaluation at the tail, cross-lingual down-stream transfer and many more insights will be shared.

This talk is part of the Language Technology Lab Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2021 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity