University of Cambridge > Talks.cam > Language Technology Lab Seminars > From Sparse Modeling to Sparse Communication

From Sparse Modeling to Sparse Communication

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Marinela Parovic.

Sparse modeling is an important, decades-old area in machine learning which aims to select and discover the relevant features that should be included in a model. In this talk I will describe how this toolbox can be extended and adapted for facilitating sparse communication in neural networks. The building block is a family of sparse transformations called alpha-entmax, a drop-in replacement for softmax. Entmax transformations are differentiable and (unlike softmax) they can return sparse probability distributions, useful to select relevant input features.

In the first part, I will illustrate the use of alpha-entmax in attention mechanisms. These sparse transformations and their structured and continuous variants have been applied with success to machine translation, natural language inference, visual question answering, and other tasks. I will show how learning the alpha parameter can lead to “adaptively sparse transformers,” where each attention head learns to choose between focused or spread-out behavior. I will proceed to describe a framework for model prediction explainability as a sparse communication problem between an explainer and a layperson, which takes advantage of the selection capabilities of sparse attention. If time permits, I will show how this framework can be extended to continuous domains to obtain sparse densities, illustrating with an application in visual question answering where “continuous attention” selects elliptical regions in the image.

In the second part, I will show how sparse transformations can also be used as a replacement for the cross-entropy loss, via the family of entmax losses. This leads to sparse sequence-to-sequence models, where beam search can be exact, and to language models that are natively sparse, eliminating the need for top-k and nucleus sampling. I will show applications in morphological tasks, machine translation, and text generation.

This work was funded by the DeepSPIN ERC project (https://deep-spin.github.io).

This talk is part of the Language Technology Lab Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2021 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity