University of Cambridge > > Rainbow Group Seminars > Learning to see with deep learning architectures for localisation and scene understanding

Learning to see with deep learning architectures for localisation and scene understanding

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Mariana Marasoiu.


We can now teach machines to recognize objects. However, in order to teach a machine to “see” we need to understand geometry as well as semantics. Given an image of a road scene, for example, an autonomous vehicle needs to determine where it is, what’s around it, and what’s going to happen next? This requires not only object recognition, but depth, motion and spatial perception, and instance-level identification. We present work towards solving these problems using deep learning.

The first, SegNet, is a deep convolutional network architecture designed to map input RGB images to pixel labels for scene understanding. It is composed of an encoder network and a decoder network which ends with a soft­max classifier. The entire architecture can be trained end-to-end using stochastic gradient descent. SegNet can produce a dense pixel-wise output in real-time with a measure of model uncertainty. We show SegNet applied to both classification and regression tasks.

Secondly, PoseNet is a real-time relocalisation system. We show how to train very deep networks to regress the camera’s 3D position and orientation from a single image. The algorithm can operate over large scale indoor and outdoor areas in real time.

Live web demonstrations and links to publications can be found on our project webpages:

This talk is part of the Rainbow Group Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2024, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity