University of Cambridge > > Computational and Systems Biology Seminar Series 2023 - 24 > A semantics knowledge commons for climate change

A semantics knowledge commons for climate change

Add to your list(s) Download to your calendar using vCal

  • UserPeter Murray-Rust, Reader Emeritus in Molecular Informatics, Yusuf Hamied Department of Chemistry
  • ClockWednesday 19 October 2022, 14:00-15:00
  • HouseCMS, Meeting Room 15.

If you have a question about this talk, please contact Samantha Noel.

Our intention is to deliver all Seminars in person. Seminars are aimed mainly at MPhil CompBio students, but are open to anyone who wishes to attend by pre-booking with the Administrator

Bioscience is fortunate in that the community has created a very large frictionless semantic knowledge commons for the data it creates and uses.
  • knowledge: the information is organized systematically
  • semantic: machines can “understand” the knowledge, either because it contains instructions and/or the toolchain is universal.
  • frictionless data: the data can be immediately unpacked without logins or explicit permissions
  • commons: everyone can take part in the knowledge regardless of country, experience, age.

Most other subjects have highly heterogeneous data without semantics and this holds back the creation of knowledge. There is a pressing need to make knowledge about climate available to mitigate the effects of gaseous emissions. The most important resource is the UN’s IPCC reports, published about every five years. In 2021-2022 AR6 , with 10_000 pages, was released. #semanticClimate is a group of young Indian science students who are developing tools and community protocols to make IPCC .AR6 semantic.

Our first step is to convert PDF to structured HTML (a messy business) and then to use a variety of Text-mining tools to create vocabularies. These are turned into a distributed ontology based on equivalences with Wikidata items. Wikidata has 100 million items and maps onto most important metadata bases, e.g. genes, species, chemicals and other infrastructure such as countries, states, protocols, organizations, research establishments, etc. This effectively creates a knowledge graph for the reports, mapped onto the public Linked Open Data cloud.

The system can be used for any set of documents, such as a corpus for a literature report. All tools and data are open and participants can use the systems locally or in Google Colab.

Ref: (run on 2022-09-24) Dr Gitanjali was a Cambridge-India Lecturer for 5 years

This talk is part of the Computational and Systems Biology Seminar Series 2023 - 24 series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2024, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity