COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > Language Technology Lab Seminars > End-to-End Fine-grained Multi-modal Understanding
End-to-End Fine-grained Multi-modal UnderstandingAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Panagiotis Fytas. Previously, multi-modal reasoning systems relied on a pre-trained object detector to extract regions of interest from the image. However, this crucial module was typically used as a black box, trained independently of the downstream task and on a fixed vocabulary of objects and attributes. This made it challenging for such systems to capture the long tail of visual concepts expressed in free form text. In this talk, I will first discuss MDETR , an end-to-end modulated detector that detects objects in an image, conditioned on a raw text query like a caption or a question. The model is trained on 1.3M text-image pairs, mined from pre-existing multi-modal datasets having explicit alignment between phrases in text and objects in the image. Next, we will explore further developments in architecture design that employ fusion between the visual and textual modalities deeper in the model, achieving state of the art results when coupled with a coarse-to-fine pre-training strategy. Finally, I will discuss a novel fine-grained visual understanding task and evaluation benchmark which shows that existing benchmarks overestimate VL model’s ability to understand and reason over complex visual scenes leaving substantial room for improvement. This talk is part of the Language Technology Lab Seminars series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsPhilosophy of Physics Thinking Society: How is understanding possible? Type the title of a new list hereOther talksCoffee in Battcock F17 - Life, the Universe, and Science Fiction: Utopia and the Technological Imagination Plenary Talk: TBC Statistics Clinic Michaelmas 2022 V Observational Keynote 1: Geomagnetic Field Observations - Knowns and Other Things Welcome Rethinking Materials Discovery with Generative Models |