COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > Artificial Intelligence Research Group Talks (Computer Laboratory) > Multi-fidelity machine learning models for improved high-throughput screening predictions
Multi-fidelity machine learning models for improved high-throughput screening predictionsAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Mateja Jamnik. High throughput screening (HTS) is one of the leading techniques for hit identification in drug discovery, being widely adopted in academia and industry. However, HTS is still regarded as a brute-force approach, with substantial costs and complexity involved in running large-scale screening campaigns. Even as industry-leading laboratories produce millions of measurements per HTS project, the resulting data are not fully understood and are usually not used as part of modern computational pipelines. Thus, these challenges require an interdisciplinary approach, and in particular it is desirable to leverage modern machine learning techniques to optimise the current workflows. In this talk, I will discuss how we studied real-world HTS data from the public domain as well as in-house AstraZeneca data, aiming to answer questions regarding the benefits of integrating HTS data exhibiting different levels of noise (an aspect called ‘multi-fidelity’), as well as relating the computational insights with experimental details. As a first step, we assembled and curated a diverse collection of 60 public multi-fidelity datasets from PubChem, designed as a benchmark for machine learning applications in HTS . With the help of previously unexplored data and graph neural networks, we can now model a large and varied chemical space (up to 3 orders of magnitude higher than existing efforts) and integrate these signals into models of bioactivity prediction. I will present results showing that this integration leads to significant improvements in the majority of datasets, as well as discuss several effects and unique aspects of the proposed workflow. Finally, I will link conclusions made from modelling multi-million scale datasets with our recent work in graph representation learning. This talk is part of the Artificial Intelligence Research Group Talks (Computer Laboratory) series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsArchitectural Windows & Doors Australia Cambridge Haematopoiesis Seminars CU German Society TalksOther talksFractional-variable-order digital controller design and tuning for automatic voltage regulator system Role of cellular senescence in cancer and ageing: therapeutic opportunities Nonlinear Schrödinger equations with trapping potentials in higher dimensions Bacterial movement by run and tumble: models, patterns, pathways, scales Control limited perceptual decision making Non-local differential operators in probability - Kirk Lecture |