What do sentence transformers know, and how can we find out?
- 👤 Speaker: Sebastian Pado, Stuttgart University
- 📅 Date & Time: Thursday 29 February 2024, 11:00 - 12:00
- 📍 Venue: GR06/07 | Faculty of English, 9 West Road, CB3 9DP
Abstract
Transformer architectures specialised for producing full-sentence representations, notably SBERT , often achieve better performance on downstream tasks than sentence embeddings extracted from vanilla BERT . However, compared to Vanilla transformers, we still have a limited understanding of which linguistic properties of the inputs are represented well (or less well) within these models.
In my presentation, I will report on two angles from which we have analyzed SBERT : (a), a black-box testing approach where we build minimal pairs of synthetic sentences to observe and analyze differences in the model’s predictions [1]; and (b), a white-box testing approach where we extend the Integrated Gradients attribution method to the Siamese case. This permits us to decompose model predictions on arbitrary input in terms of the contributions of individual token pairs [2,3].
[1] Dmitry Nikolaev and Sebastian Padó. Representation biases in sentence transformers. Proceedings of EACL . Dubrovnik, Croatia, 2023. https://aclanthology.org/2023.eacl-main.268
[2] Lucas Möller, Dmitry Nikolaev and Sebastian Padó. An Attribution Method for Siamese Encoders. Proceedings of EMNLP . Singapore, 2023. https://aclanthology.org/2023.emnlp-main.980
[3] Lucas Möller, Dmitry Nikolaev and Sebastian Padó. Approximate Attributions for Off-the-Shelf Siamese Transformers. Proceedings of EACL . St Julian’s, Malta, 2024. https://arxiv.org/abs/2402.02883
Series This talk is part of the Language Technology Lab Seminars series.
Included in Lists
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- GR06/07 | Faculty of English, 9 West Road, CB3 9DP
- Guy Emerson's list
- Interested Talks
- Language Sciences for Graduate Students
- Language Technology Lab Seminars
- ndk22's list
- ob366-ai4er
- rp587
- Simon Baker's List
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Sebastian Pado, Stuttgart University
Thursday 29 February 2024, 11:00-12:00