University of Cambridge > > NLIP Seminar Series > Pruning and grafting syntactic trees for cross-lingual transfer tasks

Pruning and grafting syntactic trees for cross-lingual transfer tasks

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Andrew Caines.

Universal Dependencies is a framework for annotating syntactic trees consistently across languages to facilitate multilingual NLP and cross-lingual transfer. However, trees of equivalent sentences might assume non-overlapping shapes because of inherent typological variation. In particular, this anisomorphism is driven by the variation in 1) morphological assets and 2) in clause-level constructions (such as polar questions, predicative possession, relative clauses, etc.). In this work, we demonstrate that reducing the level of anisomorphism yields consistent gains for cross-lingual transfer tasks. First, we show how measuring anisomorphism improves the selection of the source in Dependency Parsing transfer. Second, we put forth a method to preprocess source trees matching their shapes with target trees inspired by typological documentation. This yields improvements in the BLEU scores of syntax-based Neural Machine Translation from Arabic to Dutch, and from Indonesian to Portuguese: we release these new datasets with the code. Our results indicate that the compatibility of the shapes of syntactic trees is crucial for source selection and for boosting cross-lingual transfer.

This talk is part of the NLIP Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2024, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity