University of Cambridge > Talks.cam > Artificial Intelligence Research Group Talks (Computer Laboratory) > Multi-fidelity machine learning models for improved high-throughput screening predictions

Multi-fidelity machine learning models for improved high-throughput screening predictions

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Mateja Jamnik.

Join us on Zoom

High throughput screening (HTS) is one of the leading techniques for hit identification in drug discovery, being widely adopted in academia and industry. However, HTS is still regarded as a brute-force approach, with substantial costs and complexity involved in running large-scale screening campaigns. Even as industry-leading laboratories produce millions of measurements per HTS project, the resulting data are not fully understood and are usually not used as part of modern computational pipelines. Thus, these challenges require an interdisciplinary approach, and in particular it is desirable to leverage modern machine learning techniques to optimise the current workflows.

In this talk, I will discuss how we studied real-world HTS data from the public domain as well as in-house AstraZeneca data, aiming to answer questions regarding the benefits of integrating HTS data exhibiting different levels of noise (an aspect called ‘multi-fidelity’), as well as relating the computational insights with experimental details. As a first step, we assembled and curated a diverse collection of 60 public multi-fidelity datasets from PubChem, designed as a benchmark for machine learning applications in HTS . With the help of previously unexplored data and graph neural networks, we can now model a large and varied chemical space (up to 3 orders of magnitude higher than existing efforts) and integrate these signals into models of bioactivity prediction. I will present results showing that this integration leads to significant improvements in the majority of datasets, as well as discuss several effects and unique aspects of the proposed workflow. Finally, I will link conclusions made from modelling multi-million scale datasets with our recent work in graph representation learning.

This talk is part of the Artificial Intelligence Research Group Talks (Computer Laboratory) series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2022 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity