COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > Computational and Systems Biology > Can machines understand the scientific literature?
Can machines understand the scientific literature?Add to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Samantha Noel. Perhaps half of the 3 million annual articles, preprints, theses, and gray literature (up 5000/day) are directly relevant to biomedicine (including chemistry, materials, IT, engineering, etc.), and many of the rest (psychology, politics, law, philosophy) are needed to tackle global challenges. We need to index for scientific computation (indexing, searching, data abstraction, and ultimately “Artificial Intelligence”) But the raw material (usually PDF ) is very poorly suited for automatic ingestion and the major search engines are not well suited for science. We will present prototypes of Open tools (software, dictionaries) to extract science in computable (semantic) form. Since science is a global endeavor the tools must be equitable and inclusive and we have included collaborators using several languages (EN, HI, TA, UR, ES, IND ). The central ontology is based on multilingual Wikidata (ca 100 million Items) which is increasingly subsuming the major biomedical and chemical ontologies and some reference data. The scholarly literature is also formally indexed there (Scholia). Where possible all our entities and many of their relationships are based on Wikidata Items (Q) and Properties (P). Our primary approach is supervised text-mining through faceted dictionaries created from Wikidata SPARQL queries. Current dictionaries include countries, diseases, drugs, chemicals, species, organizations, and can be extended to many other areas (e.g. through Wikipedia categories). Besides text, many documents contain tables and diagrams and it’s also possible to extract data from these such as phylogenetic trees, Forest plots, graphs. We shall give examples of a variety of several tools that can be run from Jupyter Notebooks and designed to be generic and extensible. This talk is part of the Computational and Systems Biology series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsHero Alam Faculty of Economics health economicsOther talksArctic Freshwater Storage and Export in CMIP6 Models Cambridge Journal of Economics 2021 Conference Strategies to target senescence Blood villains and heros |