Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

On Data (In-)Dependent Hashing

Add to your list(s) Download to your calendar using vCal

Novi Quadrianto (University of Cambridge)
Thursday 31 May 2012, 14:00-15:30
Engineering Department, CBL Room 438.

If you have a question about this talk, please contact Konstantina Palla.

I will provide an overview of techniques to perform approximate nearest neighbor (ANN) search in massive datasets. The ANN search has wide-ranging applications, among others, in information retrieval for finding near-duplicate pages, in computer graphics for completing scenes, and in collaborative filtering. The most widely used approach that is particularly suitable for high-dimensional data is to build similarity-preserving hash functions which map similar data points to nearby codes. These hashing methods can be sub-divided into two main categories: data independent and data dependent methods. I will cover the locality-sensitive hashing (LSH)-based methods as a representative of the data independent approach. I will show how to build LSH that preserves hamming distance, cosine similarity, and Jaccard index. I will briefly mention some of recent machine learning based data dependent approaches such as spectral hashing and other loss-based hashing. To make things a bit closer to home research, I will also try to show some potentials of hashing for Gaussian Process Regression.

This talk is part of the Machine Learning Reading Group @ CUED series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

On Data (In-)Dependent Hashing

This talk is included in these lists:

Other lists

Other talks