University of Cambridge > Talks.cam > Data Science and AI in Medicine  > Proteomizer & GhostBuster: AI Tools for Proteomic Inference and Ghost Gene Annotation

Proteomizer & GhostBuster: AI Tools for Proteomic Inference and Ghost Gene Annotation

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Pietro Lio.

Hybrid (in presence and online)

Despite the rapid growth of multiomic datasets, our ability to interpret and integrate transcriptomic, proteomic, and regulatory information remains limited by both biological complexity and systemic biases in data and literature. In this talk, I will present two complementary machine learning frameworks—Proteomizer and GhostBuster—that address these challenges from distinct but synergistic angles.

Proteomizer is a deep learning platform that predicts proteomic landscapes from transcriptomic and miRNomic profiles, achieving state-of-the-art accuracy (r = 0.68) on over 8,600 matched samples. Beyond prediction, Proteomizer enhances differential expression analysis and enables mechanistic insights through explainable AI, revealing regulatory interactions that underlie transcript-protein discrepancies.

GhostBuster, on the other hand, tackles a different but equally critical issue: literature bias in gene annotation. Many human genes remain understudied due to sociological dynamics that skew research focus. GhostBuster is the first ML framework explicitly designed to mitigate this bias, using unbiased datasets (e.g., TCGA , LINCS) to uncover novel gene functions, disease associations, and pathway memberships. It demonstrates that models trained on less-biased data are significantly more effective at identifying emerging biological knowledge, particularly for “ghost genes”.

Together, these tools exemplify a new generation of interpretable, bias-aware machine learning approaches that not only improve predictive performance but also expand our capacity to generate biologically meaningful hypotheses—especially for the vast uncharted regions of the human genome.

This talk is part of the Data Science and AI in Medicine series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2025 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity