University of Cambridge > > Language Technology Lab Seminars > Pitfalls in Evaluation of Multilingual Text Representations

Pitfalls in Evaluation of Multilingual Text Representations

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Marinela Parovic.

Multilingual representation spaces, spanned by multilingual word embeddings or massively multilingual transformers, conceptually enable modeling of meaning across a wide range of languages and language transfer of task-specific NLP models from resource-rich to resource-lean languages. It is not yet clear, however, to which extent this conceptual promise holds in practice. Recent models, both cross-lingual word embedding models and multilingual transformers, have been praised for being able to induce multilingual representation spaces without any explicit supervision (i.e., without any word-level alignments or parallel corpora). In this talk, I will point to some prominent shortcomings and pitfalls of existing evaluations of multilingual representation spaces, which mask important limitations of state-of-the-art multilingual representation models. Remedying for some of these evaluation shortcomings, portrays meaning representation and language transfer capabilities of current state-of-the-art multilingual representation spaces in a less favorable light.

This talk is part of the Language Technology Lab Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2024, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity