Multilingual Models for Distributed Semantics
Add to your list(s)
Download to your calendar using vCal
If you have a question about this talk, please contact Tamara Polajnar.
In this talk I will present a technique for learning semantic representations, which extends the distributional hypothesis to multilingual data and joint-space embeddings. These models leverage parallel data and learn to strongly align the embeddings of semantically equivalent sentences, while maintaining sufficient distance between those of dissimilar sentences, using a form of noise-contrastive update.
A nice feature of these models is that they do not rely on word alignments or any syntactic information, making them easy to apply to a large number of diverse languages. I will briefly also describe an extension of this approach to learn semantic representations at the document level.
The talk will conclude with an analysis of these models and some empirical evaluation. Using several cross-lingual document classification tasks, I show that this approach can be used to learn semantically plausible, multilingual distributed representations.
This talk is part of the NLIP Seminar Series series.
This talk is included in these lists:
Note that ex-directory lists are not shown.
|