University of Cambridge > > NLIP Seminar Series > Statistical anaphora resolution in biomedical texts

Statistical anaphora resolution in biomedical texts

Add to your list(s) Download to your calendar using vCal

  • UserCaroline Gasperin, Computer Laboratory, University of Cambridge
  • ClockFriday 24 October 2008, 12:00-13:00
  • HouseSW01, Computer Laboratory.

If you have a question about this talk, please contact Johanna Geiss.

“I will present my PhD work on anaphora resolution in biomedical texts. Biomedical literature has been the focus of relevant information extraction projects, and resolving anaphora is an important step in the identification of mentions of biomedical entities about which information could be extracted.

I propose a probabilistic model for the resolution of anaphora in biomedical texts. The model results from a simple decomposition process applied to a conditional probability equation that involves several parameters (features). The decomposition makes use of Bayes’ rule and independence assumptions, and aims to decrease the impact of data sparseness on the model. The model seeks to find the antecedents of anaphoric expressions, both coreferent and associative ones, and also to identify discourse-new expressions. The model is able to reach state-of-the art performance despite being trained on a small corpus; it achieves 55-69\ precision and 57-71\ recall on coreferent cases, and reasonable performance on different classes of associative cases.

I have created a corpus of 5 biomedical articles to train and evaluate the model. The corpus is annotated with anaphoric links between noun phrases referring to the biomedical entities of interest. Such noun phrases are typed according to a scheme that is based on the Sequence Ontology; it distinguishes 7 types of entities: gene, part of gene, product of gene, part of product, subtype of gene, supertype of gene and gene variant. This corpus is publicly available.”

This talk is part of the NLIP Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2023, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity