University of Cambridge > > NLIP Seminar Series > Language (In)Equality in Parsing and Machine Translation: Data Size is Only One Term in the Equation

Language (In)Equality in Parsing and Machine Translation: Data Size is Only One Term in the Equation

Add to your list(s) Download to your calendar using vCal

  • UserArianna Bisazza (University of Groningen) World_link
  • ClockThursday 19 May 2022, 13:00-14:00
  • HouseVirtual (Zoom).

If you have a question about this talk, please contact Michael Schlichtkrull.


Despite the tremendous improvements achieved in less than a decade by neural models, NLP is still far from reaching language equality (i.e. comparable performance in all languages). The uneven amount of data available in different languages is often recognized as the main culprit. In this talk I will discuss recent work that acknowledges this situation and attempts to address it, not by collecting or synthesizing more data, but by exploiting linguistic information already existing for a large number of high- to very-low-resourced languages. In particular I will show how typological features can be used to learn language embeddings that boost the quality of a multilingual dependency parser. In the second part of the talk, I will discuss another obstacle to language inequality, namely the fact that some languages are intrinsically more difficult to model than others, even when controlling for training data size. Specifically, I will present recent results on the effect of word order freedom and case marking on the quality of state-of-the-art neural machine translation.


Arianna Bisazza is Assistant Professor in Computational Linguistics at the University of Groningen, The Netherlands. Her research aims to identify the intrinsic limitations of current language modeling paradigms, and to improve the quality of machine translation for challenging language pairs. She previously worked as a postdoc at the University of Amsterdam and as a research assistant at Fondazione Bruno Kessler, Trento.

Topic: NLIP Seminar Time: May 19, 2022 01:00 PM London

Join Zoom Meeting

Meeting ID: 941 1288 8558 Passcode: 420834

This talk is part of the NLIP Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2024, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity