University of Cambridge > > CUED Computer Vision Research Seminars > Efficient Image Scene Analysis and Applications

Efficient Image Scene Analysis and Applications

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact ah781.

Images remain one of the most popular and ubiquitous media for capturing and documenting the world around us. Developing efficient algorithms for understanding such images is of great importance for many applications in computer vision and computer graphics. In this report, I will present three algorithms for efficient image scene understanding as well as their applications.

Automatic estimation of salient object regions across images, without any prior assumption or knowledge of the contents of the corresponding scenes, enhances many computer vision and computer graphics applications. We introduce a regional contrast based salient object extraction algorithm, which simultaneously evaluates global contrast differences and spatial weighted coherence scores. Experimental results on famous benchmarks demonstrated that our algorithm consistently outperforms existing salient object detection and segmentation methods, yielding higher precision and better recall rates. The proposed method, which do not require having expensive training data annotation in advance, provides an economical and practical tool to analysis large scale unlabeled dataset (e.g. internet images).

Training a generic objectness measure to produce a small set of candidate object windows, has been shown to speed up the classical sliding window object detection paradigm. We proposed a novel binarized normed gradients (BING) feature for objectness estimation of image windows. Our novel feature enables a few atomic operations (e.g. ADD , BITWISE SHIFT , etc.) to test the objectness score of an image window. Experiments on the challenging PASCAL VOC 2007 dataset show that our method efficiently (300fps on a single laptop CPU , 1000 times faster than existing methods) generates a small set of category-independent, high quality object windows, yielding 96.2% object detection rate (DR) with 1,000 proposals.

Humans describe images in terms of nouns and adjectives while algorithms operate on images represented as sets of pixels. Bridging this gap between how we would like to access images versus their typical representation is the goal of image parsing. In this paper we propose treating nouns as object labels and adjectives as visual attributes. This allows us to formulate the image parsing problem as one of jointly estimating per-pixel object and attribute labels from a set of training images. We propose an efficient (interactive time) solution to this problem. Using the extracted attribute labels as handles, our system empowers a user to verbally refine the results. This enables hands free parsing of an image into pixel-wise object/attribute labels that correspond to human semantics.

This talk is part of the CUED Computer Vision Research Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2017, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity