Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Vision-language models (VLMs)

Add to your list(s) Download to your calendar using vCal

Varun Jain (University of Cambridge)
Wednesday 11 June 2025, 11:00-12:30
Cambridge University Engineering Department, CBL Seminar room BE4-38..

If you have a question about this talk, please contact .

Teams link available upon request (it is sent out on our mailing list, eng-mlg-rcc [at] lists.cam.ac.uk). Sign up to our mailing list for easier reminders via lists.cam.ac.uk.

This talk will chart the evolution of vision-language models (VLMs) and illustrate how architectural innovations and training paradigms have progressively closed the gap between visual perception and natural‐language understanding. I will cover models such as CLIP , Flamingo and LLaVA and discuss each of their design principles, strengths and weaknesses, and comparative performance across standard benchmarks.

This talk is part of the Machine Learning Reading Group @ CUED series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Vision-language models (VLMs)

This talk is included in these lists:

Other lists

Other talks