Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Learning generalizable models on large-scale multi-modal data

Add to your list(s) Download to your calendar using vCal

Yutian Chen - DeepMind
Wednesday 18 October 2023, 14:00-15:00
Maxwell Centre.

If you have a question about this talk, please contact James Fergusson.

The abundant spectrum of multi-modal data provides a significant opportunity for augmenting the training of foundational models beyond mere text. In this talk, I will introduce two lines of work that leverage large-scale models, trained on Internet-scale multi-modal datasets, to achieve good generalization performance. The first work trains an audio-visual model on YouTube datasets of videos and enables automatic video translation and dubbing. The model is able to learn the correspondence between audio and visual features, and use this knowledge to translate videos from one language to another. The second work trains a multi-modal, multi-task, multi-embodiment generalist policy on a massive collection of simulated control tasks, vision, language, and robotics. The model is able to learn to perform a variety of tasks, including controlling a robot arm, playing a game, and translating text. Both lines of work exhibit the potential future trajectory of foundational models, highlighting the transformative power of integrating multi-modal inputs and outputs.

This talk is part of the Data Intensive Science Seminar Series series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Learning generalizable models on large-scale multi-modal data

This talk is included in these lists:

Other lists

Other talks