COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > Language Technology Lab Seminars > One-shot visual language understanding with cross-modal translation and LLMs
One-shot visual language understanding with cross-modal translation and LLMsAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Panagiotis Fytas. Visual language such as charts and plots is ubiquitous in the human world. Comprehending plots and charts requires strong reasoning skills. Prior state-of-the-art (SOTA) models require at least tens of thousands of training examples and their reasoning capabilities are still much limited, especially on complex human-written queries. We present the first one-shot solution to visual language reasoning. We decompose the challenge of visual language reasoning into two steps: (1) plot-to-text translation, and (2) reasoning over the translated text. The key in this method is a modality conversion module, named as DePlot, which translates the image of a plot or chart to a linearized table. The output of DePlot can then be directly used to prompt a pretrained large language model (LLM), exploiting the few-shot reasoning capabilities of LLMs. To obtain DePlot, we standardize the plot-to-table task by establishing unified task formats and metrics, and train DePlot end-to-end on this task. DePlot can then be used off-the-shelf together with LLMs in a plug-and-play fashion. Compared with a SOTA model finetuned on thousands of data points, DePlot+LLM with just one-shot prompting achieves a 29.4% improvement over finetuned SOTA on human-written queries from the task of chart QA. This talk is part of the Language Technology Lab Seminars series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsRomance Syntax Seminar Kavli Institute for Cosmology - Summer Series The Best White Hat SEO Link Building TacticsOther talksLight, sight and the wonders of the eye…. with a focus on the lens Computational Cardiology Physics of Structure Formation in Living Systems Challenging the Hunterian hegemony: rethinking the visual culture of pregnancy in mid-eighteenth-century Britain Non-reciprocal Multifarious Self-organization |