Vision-language models (VLMs)
Add to your list(s)
Download to your calendar using vCal
If you have a question about this talk, please contact .
Teams link available upon request (it is sent out on our mailing list, eng-mlg-rcc [at] lists.cam.ac.uk). Sign up to our mailing list for easier reminders via lists.cam.ac.uk.
This talk will chart the evolution of vision-language models (VLMs) and illustrate how architectural innovations and training paradigms have progressively closed the gap between visual perception and natural‐language understanding. I will cover models such as CLIP , Flamingo and LLaVA and discuss each of their design principles, strengths and weaknesses, and comparative performance across standard benchmarks.
This talk is part of the Machine Learning Reading Group @ CUED series.
This talk is included in these lists:
Note that ex-directory lists are not shown.
|