Language (In)Equality in Parsing and Machine Translation: Data Size is Only One Term in the Equation
- đ¤ Speaker: Arianna Bisazza (University of Groningen) đ Website
- đ Date & Time: Thursday 19 May 2022, 13:00 - 14:00
- đ Venue: Virtual (Zoom)
Abstract
Abstract:
Despite the tremendous improvements achieved in less than a decade by neural models, NLP is still far from reaching language equality (i.e. comparable performance in all languages). The uneven amount of data available in different languages is often recognized as the main culprit. In this talk I will discuss recent work that acknowledges this situation and attempts to address it, not by collecting or synthesizing more data, but by exploiting linguistic information already existing for a large number of high- to very-low-resourced languages. In particular I will show how typological features can be used to learn language embeddings that boost the quality of a multilingual dependency parser. In the second part of the talk, I will discuss another obstacle to language inequality, namely the fact that some languages are intrinsically more difficult to model than others, even when controlling for training data size. Specifically, I will present recent results on the effect of word order freedom and case marking on the quality of state-of-the-art neural machine translation.
Bio:
Arianna Bisazza is Assistant Professor in Computational Linguistics at the University of Groningen, The Netherlands. Her research aims to identify the intrinsic limitations of current language modeling paradigms, and to improve the quality of machine translation for challenging language pairs. She previously worked as a postdoc at the University of Amsterdam and as a research assistant at Fondazione Bruno Kessler, Trento.
Topic: NLIP Seminar Time: May 19, 2022 01:00 PM London
Join Zoom Meeting https://cl-cam-ac-uk.zoom.us/j/94112888558?pwd=aGN2Skg2UFlnUkxWMmFuRjV6SCs0dz09
Meeting ID: 941 1288 8558 Passcode: 420834
Series This talk is part of the NLIP Seminar Series series.
Included in Lists
- All Talks (aka the CURE list)
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- Computer Education Research
- Computing Education Research
- Department of Computer Science and Technology talks and seminars
- Graduate-Seminars
- Guy Emerson's list
- Interested Talks
- Language Sciences for Graduate Students
- ndk22's list
- NLIP Seminar Series
- ob366-ai4er
- PMRFPS's
- rp587
- School of Technology
- Simon Baker's List
- Trust & Technology Initiative - interesting events
- Virtual (Zoom)
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Arianna Bisazza (University of Groningen) 
Thursday 19 May 2022, 13:00-14:00