Large-scale Retrieval with Ivory and MapReduce
Add to your list(s)
Download to your calendar using vCal
If you have a question about this talk, please contact Microsoft Research Cambridge Talks Admins.
It is commonly acknowledged that web-scale collections have outgrown the capabilities of individual machines, necessitating the use of clusters to tackle many problems in information retrieval. The release of the 25-terabyte billion-page ClueWeb09 collection in 2009 and the increasing popularity of Hadoop, the open source implementation of the MapReduce distributed framework, have motivated academic researchers to think more seriously about cluster-based distributed retrieval solutions.
In this talk, we will first introduce Ivory, an end-to-end open-source distributed retrieval system built at University of Maryland, College Park; Ivory takes full advantage of Hadoop and its underlying distributed file system for both indexing and retrieval. We will then present an overview of several research projects evolved around Ivory, such as approximate positional indexing for efficient ranked retrieval, scalable monolingual and cross-lingual pairwise document similarity, and automatically-extracted pseudo test collections for learning ranking functions for the task of web search.
This talk is part of the Microsoft Research Cambridge, public talks series.
This talk is included in these lists:
Note that ex-directory lists are not shown.
|