COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > NLIP Seminar Series > Representation Learning for Text Retrieval: Learning and Pretraining Strategies for Dense Retrieval
Representation Learning for Text Retrieval: Learning and Pretraining Strategies for Dense RetrievalAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact James Thorne. Unusual date and time Join Zoom Meeting https://cl-cam-ac-uk.zoom.us/j/95119479973?pwd=RGFYZndIVVhDWEtySy8wV3VTZlpnZz09 Meeting ID: 951 1947 9973 Passcode: 602575 Text retrieval is one of the most predominate tasks for language techniques. It is an end application itself, powering search engines for billions of users. It can also serve as a first stage retrieval component for other language systems: Question Answering, Information extraction, etc. Text retrieval has been done by matching queries and documents in the sparse, bag-of-words space, e.g., using BM25 , since the 1970s. We joked that every year we saw techniques that improved BM25 by 10%, but decades later we are still working on 10% improvement over BM25 in our research. Dense retrieval provides a unique opportunity to overcome the limitations of bag-of-word based sparse retrieval. With pretrained language models, we now can encode the query and documents into one embedding space and conduct reasonable first stage retrieval purely using embedding similarities. In this talk, I will first recap recent progress in dense retrieval, then I will present our incoming ICLR 2021 paper (ANCE) on better training dense retrieval with approximate nearest neighbor contrastive learning. The obstacles in dense retrieval training led to us questioning the alignment of pretrained language models and the needs of dense retrieval. In the last part of this talk I will present our on-going work (Seed-Encoder) in designing pretraining strategies dedicated to dense retrieval. This talk is part of the NLIP Seminar Series series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsBlueSci Talks and Workshops Exam Passing Material Sainsbury Laboratory SeminarsOther talksPathways to zero energy at home - A talk by Nicola Terry Probabilistic machine learning as an algorithmic interface to weather model and environmental data Lecture - Modern Molecular Science and How is it Changing our Life - Dr Ljiljana Fruk Patterns of live poultry exposure and implications for avian influenza transmission to humans in Dhaka, Bangladesh Observations of stellar oscillations induced by hot Jupiters [Cancelled] Languages of Emergency, Infrastructures of Response and Everyday Heroism in the Circumpolar North |