COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > Natural Language Processing Reading Group > Term Weighting Schemes for Latent Dirichlet Allocation
Term Weighting Schemes for Latent Dirichlet AllocationAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Jimme Jardine. Hi all. I will be presenting a paper on LDA . @conference{wilson2010term, title={{Term Weighting Schemes for Latent Dirichlet Allocation}}, author={Wilson, A.T. and Chew, P.A.}, booktitle={Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics}, pages={465—473}, year={2010}, organization={Association for Computational Linguistics} } Many implementations of Latent Dirichlet Al- location (LDA), including those described in Blei et al. (2003), rely at some point on the removal of stopwords, words which are as- sumed to contribute little to the meaning of the text. This step is considered necessary be- cause otherwise high-frequency words tend to end up scattered across many of the latent top- ics without much rhyme or reason. We show, however, that the ‘problem’ of high-frequency words can be dealt with more elegantly, and in a way that to our knowledge has not been considered in LDA , through the use of appro- priate weighting schemes comparable to those sometimes used in Latent Semantic Indexing (LSI). Our proposed weighting methods not only make theoretical sense, but can also be shown to improve precision significantly on a non-trivial cross-language retrieval task. This talk is part of the Natural Language Processing Reading Group series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsMaking Refuge: Creative Responses to the Refugee Crisis "Investigating Interactions" Symposium Philosophical Approaches to Education seminar seriesOther talksA feast of languages: multilingualism in neuro-typical and atypical populations Aspects of adaptive Galerkin FE for stochastic direct and inverse problems Viral evolution on sub-phylogenetic timescales CANCELLED - Methodology Masterclass: Exploring the pedagogic possibilities of new diaspora formations and transnationalism. Neural Networks and Natural Language Processing Protein targeting within the chloroplast: a cell-biological view of starch biosynthesis Amino acid sensing: the elF2a signalling in the control of biological functions BP KEYNOTE LECTURE: Importance of C-O Bond Activation for CO2/COUtilization - An Approach to Energy Conversion and Storage Direct measurements of dynamic granular compaction at the mesoscale using synchrotron X-ray radiography Alzheimer's talks Radiocarbon as a carbon cycle tracer in the 21st century |