COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > NLIP Seminar Series > A Polya Urn Document Language Model for Information Retrieval
A Polya Urn Document Language Model for Information RetrievalAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Tamara Polajnar. Although the multinomial language model has been one of the most effective unigram models of information retrieval for over a decade, it does not model one important linguistic phenomenon relating to term-dependency; namely the tendency of a term to repeat itself within a document (i.e. word burstiness). In this talk I will begin with a brief review of language modelling as applied to information retrieval. I will then present some work near completion in which we model document generation as a random process with reinforcement (a multivariate Polya process) and develop a Dirichlet compound multinomial language model that captures word burstiness. I will show that the new reinforced language model can be computed as efficiently as current retrieval models and that it significantly outperforms the multinomial model for a number of standard effectiveness metrics. I will conclude by presenting an analysis of the retrieval method which shows that it adheres to what is called the “verbosity hypothesis” and will show that the method essentially combines the term and document event spaces giving theoretical justification to tf-idf type schemes. This talk is part of the NLIP Seminar Series series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsCambridge Product Management Network Cambridge University First Aid Society Explore Islam Week 2013 Cambridge Post-Conflict and Post-Crisis Group Biological Chemistry Research Interest Group Behavioural and Clincial Neuroscience SeminarsOther talksNo interpretation of probability Transcriptional control of pluripotent stem cell fate by the Nucleosome Remodelling and Deacetylation (NuRD) complex Designing Active Macroscopic Heat Engines Poison trials, panaceas and proof: debates about testing and testimony in early modern European medicine Analytical Ultracentrifugation (AUC) The Fyodorov-Bouchaud conjecture and Liouville conformal field theory |