University of Cambridge > > NLIP Seminar Series > Towards automated understanding of scientific papers

Towards automated understanding of scientific papers

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Johanna Geiss.

The large number of scientific papers generated, especially in the life sciences, makes it a challenge for researchers and resource curators to extract and evaluate the knowledge contained within them. Automated text mining methods currently operate mainly on abstracts but scientists have highlighted the need for the automatic processing of the full text. Researchers in information extraction and information retrieval have to be able to recognise areas of interest in papers and scientists have expressed the need for machine readable summaries. However, the manual production of semantic markup in papers is very time consuming and cannot cater for the millions of papers already published. We have produced a tool (SAPIENT) and an ontology-based annotation scheme for the annotation of core scientific concepts (CISP) (Goal’, Motivation’,Object’,Hypothesis’,Background’,Model’,Experiment’,Method’,Observation’,Result’,`Conclusion’) in research papers. A corpus of 225 papers covering topics in physical chemistry and biochemistry were annotated at the sentence level by 16 experts using SAPIENT and the CISP -based annotation scheme. Within the SAPIENTA project we plan to use this corpus to enable the automatic recognition of scientific concepts in papers and generate digital abstracts in both human and machine readable format. We also aim to enable intelligent querying of the content of scientific papers by exploiting the extra semantic information and representing the relevant sections in a first order logic form that reasoners can handle.

Bio: Dr Maria Liakata has an Oxford DPhil in Computational Linguistics, on the topic of using Inductive Logic Programming to learn pragmatic knowledge from a corpus (Inducing Domain Theories). Since June 2005 she has been a research associate with the Computational Biology group at Aberystwyth University and has worked on interdisciplinary projects, such as the Robot Scientist, involving the automation and formalisation of science.

This talk is part of the NLIP Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2024, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity