Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Towards automated understanding of scientific papers

Add to your list(s) Download to your calendar using vCal

Maria Liakata
Thursday 21 May 2009, 12:00-13:00
SW01, Computer Laboratory.

If you have a question about this talk, please contact Johanna Geiss.

The large number of scientific papers generated, especially in the life sciences, makes it a challenge for researchers and resource curators to extract and evaluate the knowledge contained within them. Automated text mining methods currently operate mainly on abstracts but scientists have highlighted the need for the automatic processing of the full text. Researchers in information extraction and information retrieval have to be able to recognise areas of interest in papers and scientists have expressed the need for machine readable summaries. However, the manual production of semantic markup in papers is very time consuming and cannot cater for the millions of papers already published. We have produced a tool (SAPIENT) and an ontology-based annotation scheme for the annotation of core scientific concepts (CISP) (Goal’, Motivation’,Object’,Hypothesis’,Background’,Model’,Experiment’,Method’,Observation’,Result’,`Conclusion’) in research papers. A corpus of 225 papers covering topics in physical chemistry and biochemistry were annotated at the sentence level by 16 experts using SAPIENT and the CISP -based annotation scheme. Within the SAPIENTA project we plan to use this corpus to enable the automatic recognition of scientific concepts in papers and generate digital abstracts in both human and machine readable format. We also aim to enable intelligent querying of the content of scientific papers by exploiting the extra semantic information and representing the relevant sections in a first order logic form that reasoners can handle.

Bio: Dr Maria Liakata has an Oxford DPhil in Computational Linguistics, on the topic of using Inductive Logic Programming to learn pragmatic knowledge from a corpus (Inducing Domain Theories). Since June 2005 she has been a research associate with the Computational Biology group at Aberystwyth University and has worked on interdisciplinary projects, such as the Robot Scientist, involving the automation and formalisation of science.

This talk is part of the NLIP Seminar Series series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Towards automated understanding of scientific papers

This talk is included in these lists:

Other lists

Other talks