Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Understanding Word Embeddings

Add to your list(s) Download to your calendar using vCal

Omer Levy, Bar-Ilan University
Tuesday 13 October 2015, 14:00-15:00
FW11, Computer Laboratory.

If you have a question about this talk, please contact Tamara Polajnar.

Neural word embeddings, such as word2vec (Mikolov et al., 2013), have become increasingly popular in both academic and industrial NLP . These methods attempt to capture the semantic meanings of words by processing huge unlabeled corpora with methods inspired by neural networks and the recent onset of Deep Learning. The result is a vectorial representation of every word in a low-dimensional continuous space. These word vectors exhibit interesting arithmetic properties (e.g. king – man + woman = queen) (Mikolov et al., 2013), and seemingly outperform traditional vector-space models of meaning inspired by Harris’s Distributional Hypothesis (Baroni et al., 2014). Our work attempts to demystify word embeddings, and understand what makes them so much better than traditional methods at capturing semantic properties.

Our main result shows that state-of-the-art word embeddings are actually “more of the same”. In particular, we show that skip-grams with negative sampling, the latest algorithm in word2vec, is implicitly factorizing a word-context PMI matrix, which has been thoroughly used and studied in the NLP community for the past 20 years. We also identify that the root of word2vec’s perceived superiority can be attributed to a collection of hyperparameter settings. While these hyperparameters were thought to be unique to neural-network-inspired embedding methods, we show that they can, in fact, be ported to traditional distributional methods, significantly improving their performance. Among our qualitative results is a method for interpreting these seemingly-opaque word-vectors, and the answer to why king – man + woman = queen.

Based on joint work with Yoav Goldberg and Ido Dagan.

This talk is part of the NLIP Seminar Series series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Understanding Word Embeddings

This talk is included in these lists:

Other lists

Other talks