University of Cambridge > Talks.cam > NLIP Seminar Series >  Interactive and decomposed approaches for NLP: the case of multi-text summarization

Interactive and decomposed approaches for NLP: the case of multi-text summarization

Add to your list(s) Download to your calendar using vCal

  • UserIdo Dagan (Bar-Ilan University) World_link
  • ClockFriday 29 April 2022, 12:00-13:00
  • HouseVirtual (Zoom).

If you have a question about this talk, please contact Michael Schlichtkrull.

Current approaches for NLP tasks often conform to two design principles. First, they address “static” tasks, where a single input instance is addressed at a time, independently of other inputs. Second, outputs are computed via an end-to-end model, trained directly over input-output pairs for the task. In this talk, I will propose two directions in which NLP research may be systematically extended beyond the static end-to-end approach and demonstrate them for the use case of multi-text summarization. In the first part of the talk I suggest that in many realistic use cases multi-text (or long-text) summarization should support an interactive setting, where users interactively direct summary generation to best fit their information exploration needs. To promote principled research in this direction, we propose a systematic evaluation framework for interactive summarization. This framework extends summarization evaluation standards to consider the accumulating information along a user session, and includes an effective procedure for collecting user sessions. We then present a deep reinforcement learning model for interactive summarization, showing (using our evaluation framework) that it significantly improves information exposure over prior baselines while preserving positive user experience. In the second part of the talk I suggest that summarization modeling may be beneficially decomposed to inherent subtasks, each addressed by a targeted model, rather than employing a single end-to-end model. Such decomposition is enabled through a clever generation of targeted training datasets for specific subtasks, all derived from the original “end-to-end” training data. As an additional contribution related to this context, I will describe our Cross-Document Language Model (CDLM), which is pre-trained specifically to model cross-text relationships, supporting diverse cross-document tasks.

Bio:

Ido Dagan is a Professor at the Department of Computer Science at Bar-Ilan University, Israel, the founder of the Natural Language Processing (NLP) Lab at Bar-Ilan, the founding Director of the nationally funded Bar-Ilan University Data Science Institute, and a Fellow of the Association for Computational Linguistics (ACL). His interests are in applied semantic processing, focusing on textual inference, natural open semantic representations, consolidation and summarization of multi-text information, and interactive text summarization and exploration. Dagan and colleagues initiated and promoted textual entailment recognition (RTE, later aka NLI ) as a generic empirical task. He was the President of the ACL in 2010 and served on its Executive Committee during 2008-2011. In that capacity, he led the establishment of the journal Transactions of the Association for Computational Linguistics, which became one of two premiere journals in NLP . Dagan received his B.A. summa cum laude and his Ph.D. (1992) in Computer Science from the Technion. He was a research fellow at the IBM Haifa Scientific Center (1991) and a Member of Technical Staff at AT&T Bell Laboratories (1992-1994). During 1998-2003 he was co-founder and CTO of FocusEngine and VP of Technology of LingoMotors, and has been regularly consulting in the industry. His academic research has involved extensive industrial collaboration, including funds from IBM , Google, Thomson-Reuters, Bloomberg, Intel and Facebook, as well as collaboration with local companies under funded projects of the Israel Innovation Authority.

Topic: NLIP Seminar Time: Apr 29, 2022 12:00 PM London

Join Zoom Meeting https://cl-cam-ac-uk.zoom.us/j/96419914999?pwd=RHN4TE9KMmdhY3loaE55bHRNTVFodz09

Meeting ID: 964 1991 4999 Passcode: 485878

This talk is part of the NLIP Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity