University of Cambridge > > Language Technology Lab Seminars > Monolingual and multilingual, explicit and latent vector representations of meaning

Monolingual and multilingual, explicit and latent vector representations of meaning

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Mohammad Taher Pilehvar.

In this talk I will present different kinds of representation of word senses and concepts. I will start with latent representations obtained as sense embeddings from the application of word2vec to the Wikipedia corpus, sense-tagged with a multilingual disambiguation algorithm based on BabelNet, the largest multilingual semantic network and encyclopedic dictionary covering 14 million concepts and entities and 271 languages.

I will then move on to two explicit vector representations of meaning (NASARI), based on lexical co-occurrence and multilingual semantic generalization, respectively, and a third latent version obtained from the word embeddings of the lexical vector.

Experimental results in several tasks, including word similarity, sense clustering, identification of sense predominance, and word sense disambiguation highlight high performance and show that, whenever a comparison is possible, sense representations consistently outperform word representations.

This is joint work with José Camacho-Collados, Ignacio Iacobacci and Mohammad Taher Pilehvar.

This talk is part of the Language Technology Lab Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2024, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity