![]() |
COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. | ![]() |
University of Cambridge > Talks.cam > Foundation AI > Sheaf-Based Diffusion for Multimodal Graph Learning
![]() Sheaf-Based Diffusion for Multimodal Graph LearningAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Pietro Lio. Multimodal Graph Learning (MGL) is an emerging area in machine learning that focuses on graphs whose nodes carry information from different modalities, such as text and image. A central challenge in MGL is integrating these heterogeneous data types, which are not directly comparable. Standard Graph Neural Networks (GNNs) struggle in multimodal contexts because they assume homogeneity in node features and tend to merge modalities too early, leading to the loss of valuable, modality-specific information. Existing solutions address this by processing each modality independently and fusing their predictions at the output level. However, recent studies show that these late-fusion strategies underperform compared to general-purpose GNNs. To address this limitation, we introduce MMSheaf, a family of sheaf-based neural network architectures that preserve modality separation before diffusion and introduce structured, learnable mechanisms for cross-modal interaction during message passing. As a first contribution, we show that Sheaf Neural Networks (SNNs) outperform standard GNNs like GCN or GAT on multimodal graphs, proving to be an appropriate tool for this context. Building on this insight, our MMSheaf architecture further improves performance by explicitly modeling cross-modal interactions. We evaluate MMSheaf on synthetic multimodal datasets where successful classification requires integrating modalities in a non-trivial way. Additional experiments on the real-world Ele-Fashion dataset showcase the model’s effectiveness in practical multimodal settings. Overall, our findings establish sheaf-based diffusion as a powerful and expressive framework for Multimodal Graph Learning. Future work will apply this approach to diverse domains such as biomedicine and recommender systems. Meet link: meet.google.com/wtt-wydt-hfk This talk is part of the Foundation AI series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsMathematics of John Thompson Conference on Finite Groups and related topics Computer Modelling in Biology Engineering Department Geotechnical Research SeminarsOther talksKirk Public Lecture: Title TBC Love Bites: The Deadly Romance of Spider Pulsars Namibia: Etosha National Park & Lithops Kirk Public Lecture: Title TBC Organiser's Welcome Director's Briefing |