University of Cambridge > > NLIP Seminar Series > Multilingual NLP via Cross-Lingual Word Embeddings

Multilingual NLP via Cross-Lingual Word Embeddings

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Andrew Caines.

In the recent past, NLP as a field has seen tremendous utility of word embeddings as features in downstream tasks. The fact that these word vectors can be trained on unlabeled monolingual corpora of a language makes them an inexpensive resource in NLP . With the increasing use of monolingual word vectors, there is a need for word vectors that can be used as efficiently across multiple languages as monolingually. Therefore, learning bilingual and multilingual word embeddings is currently an important research topic. These vectors offer an elegant and language-pair independent way to represent content across different languages in shared cross-lingual embedding spaces, and also enable the integration of knowledge from external resources (e.g., WordNet, dictionaries) into the embedding spaces. In this talk, I will briefly discuss the current techniques in cross-lingual word embedding learning, presenting the model typology based on multilingual training data requirements. I will then introduce several illustrative applications of the induced embedding spaces, including bilingual dictionary induction, ad-hoc cross-lingual information retrieval, and cross-lingual transfer for dialogue state tracking.

This talk is part of the NLIP Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2023, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity