University of Cambridge > Talks.cam > NLIP Seminar Series > When is Multilinguality a Curse? Language Modeling for 350 Languages

When is Multilinguality a Curse? Language Modeling for 350 Languages

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Suchir Salhan.

NOTE THE UNUSUAL TIME FOR THIS SEMINAR

Language models work well for a small number of languages. For the other languages, the best existing language model is likely multilingual, still with the vast majority of the training data coming from English and a few “priority” languages. We show that in many cases, multilinguality leads to worse performance across many languages due to limited model capacity. We then train a suite of over 1,000 monolingual models for 350 languages, finding that these models can outperform multilingual models over ten times their size. However, multilinguality can also be a blessing: we train a small number of controlled bilingual models in order to study how crosslingual transfer happens. We aim to better understand transfer learning in order to better leverage multilinguality to improve language model performance for all languages.

This talk is part of the NLIP Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2025 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity