COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > NLIP Seminar Series > Challenges in evaluating natural language generation systems
Challenges in evaluating natural language generation systemsAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Huiyuan Xie. Note unusual time Join Zoom Meeting https://cl-cam-ac-uk.zoom.us/j/91900396241?pwd=Wk5mcDYrUytkSElkMHB0T3NkNkRFQT09 Meeting ID: 919 0039 6241 Passcode: 127570 Recent advances in neural language modeling have opened up a variety of exciting new text generation applications. However, evaluating systems built for these tasks remains difficult. Most prior work relies on a combination of automatic metrics such as BLEU (which are often uninformative) and crowdsourced human evaluation (which are also usually uninformative, especially when conducted without careful task design). In this talk, I focus on two specific applications: (1) unsupervised sentence-level style transfer and (2) long-form question answering. I will go over our recent work on building models for these systems and then describe the ensuing struggles to properly compare them to baselines. In both cases, we identify (and propose solutions for) issues with existing evaluations, including improper aggregation of multiple metrics, missing control experiments with simple baselines, and high cognitive load placed on human evaluators. I’ll conclude by briefly discussing our work on machine-in-the-loop text generation systems, in which both humans and machines participate in the generation process, where reliable human evaluation becomes much more feasible. This talk is part of the NLIP Seminar Series series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsLinking Health & Sustainability Cambridge University International Development Society Economics talksOther talksThe Texas blackouts and generation adequacy challenges in Britain Controlling solid state interactions in conjugated polymers Optimal transport and control of active drops Mechanics, Additive Manufacture, and Characterisation of Lattice Biostructures Chromatin and Gene Transcription in Hypoxia Oil, Sugar and Failed Revolution in the City of the Sun God: Empowering Heritage and Community in Si Thep Thailand |