COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > NLIP Seminar Series > Mitigating Gender Bias in Morphologically Rich Languages
Mitigating Gender Bias in Morphologically Rich LanguagesAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Andrew Caines. Gender bias exists in corpora of all of the world’s languages: the bias is a function what people talk about, not of the grammar of a language. For this reason, data-driven systems in NLP that are trained on this data will inherit such bias. Evidence of bias can be found in all sorts of NLP technologies: word vectors, language models, coreference systems and even machine translation. Most of the research done to mitigate gender bias in natural language corpora, however, has focused solely on English. For instance, in an attempt to remove gender bias in English corpora, NLP practitioners often augment corpora by swapping gendered words: i.e., if “he is a smart doctor” appears, add the sentence “she is a smart doctor” to the corpus as well before training a model. The broader research question asked in this talk is the following: How can we mitigate gender bias in corpora from any of the world’s languages, not just in English? As an example, the simple swapping heuristic for English will not generalize to most of the world’s languages. Indeed, such a solution would not even apply to German, since it marks gender on both nouns and adjectives and requires gender agreement throughout a sentence. In the context of German, this task is far more complicated: mapping “er ist ein kluger Arzt” to “sie ist eine kluge Ärztin” requires more than simply swapping “er” with “sie” and “Arzt” with “Ärztin”—one also has to modify the article (“ein”) and the adjective (“klug”). In this talk, we present a machine-learning solution to this problem: we develop a novel neural random field that generates such sentence-to-sentence transformations, enforcing agreement with respect to gender. We explain how to perform inference and morphological reinflection to generate such transformations without any labeled training examples. Empirically, we illustrate that the model manages to reduce gender bias in corpora without sacrificing grammaticality with a novel metric of gender bias. Additionally, we discuss concrete applications to coreference resolution and machine translation. This talk is part of the NLIP Seminar Series series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsBritish Computer Society SPA Cambridge 2013 UK~IRC Innovation Summit Bullard Laboratories Tea Time TalksOther talksCapturing Complexity in Improvisation Research: The Influence of Precondition Atomistically inspired origami Homo sapiens evolution through the Asian lens How to destroy a book Atmospheric dispersion in cities Some Sparse Recovery Methods in Compressed Sensing |