Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Challenges in evaluating natural language generation systems

Add to your list(s) Download to your calendar using vCal

Mohit Iyyer (University of Massachusetts Amherst)
Friday 11 June 2021, 13:00-14:00
Virtual (Zoom).

If you have a question about this talk, please contact Huiyuan Xie.

Note unusual time

Join Zoom Meeting https://cl-cam-ac-uk.zoom.us/j/91900396241?pwd=Wk5mcDYrUytkSElkMHB0T3NkNkRFQT09

Meeting ID: 919 0039 6241 Passcode: 127570

Recent advances in neural language modeling have opened up a variety of exciting new text generation applications. However, evaluating systems built for these tasks remains difficult. Most prior work relies on a combination of automatic metrics such as BLEU (which are often uninformative) and crowdsourced human evaluation (which are also usually uninformative, especially when conducted without careful task design). In this talk, I focus on two specific applications: (1) unsupervised sentence-level style transfer and (2) long-form question answering. I will go over our recent work on building models for these systems and then describe the ensuing struggles to properly compare them to baselines. In both cases, we identify (and propose solutions for) issues with existing evaluations, including improper aggregation of multiple metrics, missing control experiments with simple baselines, and high cognitive load placed on human evaluators. I’ll conclude by briefly discussing our work on machine-in-the-loop text generation systems, in which both humans and machines participate in the generation process, where reliable human evaluation becomes much more feasible.

This talk is part of the NLIP Seminar Series series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Challenges in evaluating natural language generation systems

This talk is included in these lists:

Other lists

Other talks