Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Measuring Causal Effects of Data Statistics on Language Model Predictions

Add to your list(s) Download to your calendar using vCal

Yanai Elazar (Bar-Ilan University)
Wednesday 01 June 2022, 17:00-18:00
Computer Lab, FW26.

If you have a question about this talk, please contact Michael Schlichtkrull.

Abstract:

The training data is one of the major reasons for state-of-the-art NLP models. But what exactly in the training data causes a model to make a certain prediction? We seek to answer this research question by formalizing it in a causal framework that provides a useful language for investigating how training data influence predictions. Importantly, our causal framework bypasses the need to retrain expensive models and allows us to estimate causal effects based on observational data alone. Addressing the problem of extracting factual knowledge from pretrained language models (PLMs), we focus on simple data statistics: co-occurrences counts, and show that these statistics influence the predictions of PLMs. This establishes a causal link between simple statistics from the training data (co-occurrence counts) and PLMs’ behavior, and shows that their language understanding is limited. Our causal framework and our results demonstrate the importance of categorizing and studying datasets used for model training and the benefits of causality in our field for understanding NLP models.

Bio:

Yanai Elazar is a fourth-year PhD student at Bar-Ilan University, working with Prof. Yoav Goldberg on NLP . His main interests involve model interpretation, analysis, biases in datasets and models, and commonsense reasoning. Yanai was awarded multiple scholarships, including the PBC fellowship for outstanding PhD candidates in Data Science, and the Google PhD Fellowship.

This talk is part of the NLIP Seminar Series series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Measuring Causal Effects of Data Statistics on Language Model Predictions

This talk is included in these lists:

Other lists

Other talks