COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > NLIP Seminar Series > Making the World's Scientific Information (More) Organized, Accessible, and Usable
Making the World's Scientific Information (More) Organized, Accessible, and UsableAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Laura Rimell. Web portals like Google Scholar and ScienceDirect have revolutionized access to scientific information by making it possible to identify relevant papers via keyword search, and then to browse them on-line. However, as scientific information continues to grow exponentially, and as (e-)science embraces automation, keeping abreast of and exploiting the information in these papers effectively is becoming impossible. I’ll describe a prototype scientific literature search and information extraction system, developed in collaboration with the FlyBase (Fruit Fly Genomics) curation team, designed to support very fine-grained but intuitive querying and access to information in a collection of papers. FlySearch indexes annotated papers and supports integrated search over individual sentences and images, aggregating information across the collection. For example, one can search captions describing a specific gene regulating a biological process and restrict the associated images to a specific body part. The system rests on a processing pipeline in which a Portable Document Format paper is first converted to Scientific eXtensible Mark-up Language, preserving its logical structure but, for example, separating images, tables, and references from running text, and then applying specialized text and image processing tools to the different components of the paper. These are able to compute image similarity, recognize gene names, facts about genes, and their relationships to other biological entities, etc. They have been designed to be as generic as possible to facilitate application to different areas of science. Where they require domain-specific tuning they have been developed using semi-supervised machine learning methods to minimize such costs. Initial results suggest that many aspects of the user interface need refinement but the underlying search functionality is able to improve speed and precision significantly over keyword-based document-level search. Nevertheless, many further challenges remain, of which perhaps the most pressing is handling more forms of contextually-mediated variant ways of expressing the same meaning, but we would also like to be able to go beyond finding and extracting relations between biological entitites and, for example, support (e.g. temporal) reasoning about biological events. This talk is part of the NLIP Seminar Series series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsCancer Genetic Epidemiology Seminar Series Thinking Society: General and Particular women@CL Coffee and CakeOther talksFumarate hydratase and renal cancer: oncometabolites and beyond Physico-chemical biology in practice, 1920s–1930s Cosmology from the Kilo-Degree Survey Computing knot Floer homology Protein Folding, Evolution and Interactions Symposium The Anne McLaren Lecture: CRISPR-Cas Gene Editing: Biology, Technology and Ethics 'Honouring Giulio Regeni: a plea for research in risky environments' Autumn Cactus & Succulent Show TODAY Foster Talk - "Paraspeckles, TDP-43 & alternative polyadenylation: how regulation of a membraneless compartment guides cell fate" |