University of Cambridge > > Language Technology Lab Seminars > Contextualized embeddings for lexical semantics

Contextualized embeddings for lexical semantics

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Marinela Parovic.

Word embeddings are dense vector representations computed automatically from large amounts of text. From a lexical semantics perspective, we can view an embedding as a compact aggregate of many observed word uses, from different speakers. Especially contextualized word embeddings are highly interesting for lexical semantics because they give us a potential window into garden-variety polysemy: polysemy that is entirely idiosyncratic, not regular. But there is not yet a standardized way to use contextualized embeddings for lexical semantics. I report on two studies we have been doing. In the first, we tested the use of word token clusters on the task of type-level similarity. In the second, we are mapping word token embeddings to human-readable features. I also comment on a trend in word embeddings, from count-based embeddings to the most recent contextualized embeddings, to pick up on what could be called traces of stories: text topics, judgments and sentiment, and cultural trends. I argue that this is actually an interesting signal and not a bug.

This talk is part of the Language Technology Lab Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2023, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity