Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Can machines understand the scientific literature?

Add to your list(s) Download to your calendar using vCal

Dr. Peter Murray-Rust (Department of Chemistry, University of Cambridge)
Wednesday 11 November 2020, 14:00-15:00
Please get in touch at compbiomphil@maths.cam.ac.uk for joining information..

If you have a question about this talk, please contact Samantha Noel.

Perhaps half of the 3 million annual articles, preprints, theses, and gray literature (up 5000/day) are directly relevant to biomedicine (including chemistry, materials, IT, engineering, etc.), and many of the rest (psychology, politics, law, philosophy) are needed to tackle global challenges. We need to index for scientific computation (indexing, searching, data abstraction, and ultimately “Artificial Intelligence”) But the raw material (usually PDF ) is very poorly suited for automatic ingestion and the major search engines are not well suited for science. We will present prototypes of Open tools (software, dictionaries) to extract science in computable (semantic) form. Since science is a global endeavor the tools must be equitable and inclusive and we have included collaborators using several languages (EN, HI, TA, UR, ES, IND ).

The central ontology is based on multilingual Wikidata (ca 100 million Items) which is increasingly subsuming the major biomedical and chemical ontologies and some reference data. The scholarly literature is also formally indexed there (Scholia). Where possible all our entities and many of their relationships are based on Wikidata Items (Q) and Properties (P). Our primary approach is supervised text-mining through faceted dictionaries created from Wikidata SPARQL queries. Current dictionaries include countries, diseases, drugs, chemicals, species, organizations, and can be extended to many other areas (e.g. through Wikipedia categories). Besides text, many documents contain tables and diagrams and it’s also possible to extract data from these such as phylogenetic trees, Forest plots, graphs.

We shall give examples of a variety of several tools that can be run from Jupyter Notebooks and designed to be generic and extensible.

This talk is part of the Computational and Systems Biology series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Can machines understand the scientific literature?

This talk is included in these lists:

Other lists

Other talks