COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
Cambridge Compiler Social TalksAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Luisa Cicolini. Within the next Compiler Social we will host two talks: Quidditch: An end-to-end deep learning compiler for highly-concurrent accelerators with software-managed caches – by Markus Boeck (University of Cambridge) The wide adoption of Deep Neural Networks and the resulting desire for more hardware resources has fueled the rapid development of innovative custom hardware accelerators that are increasingly difficult to program. Many proposed hardware designs are only evaluated with hand-written micro-kernels, and the few evaluated on entire neural networks typically require significant investments in building the necessary software stacks. Highly sophisticated neural network compilers emerged to generate DNNs out of expert-written microkernels, but they were traditionally hand-crafted for each platform, which prevented both scaling and synergy with industry-supported compilation flows. We present Quidditch, a novel neural network compiler and runtime, that provides an end-to-end workflow from a high-level network description to high-performance code running on ETH Occamy, one of the first chiplet-based AI research hardware accelerators. Quidditch builds on IREE , an industry-strength AI compiler and imports NNs from PyTorch, JAX , and Tensorflow and offers optimisations such as fusion, scheduling, buffer allocation, memory and multi-level concurrency-guided tiling and asynchronous memory transfers to scratchpads. We present a set of preliminary novel optimisations, SSA -based double-buffering and barrier management for scratchpads, and redundant transfer elimination tailored for explicitly managed memory. We pair this with a high-performance microkernel generator, which enables us to run full DNNs with full FPU occupancy and a more than 20x speed-up over IREE ’s generic LLVM backend on our custom hardware accelerator. By providing key building blocks for scaling AI accelerator compilation to full neural networks, we aim to accelerate the evaluation of custom AI hardware and, as a result, AI hardware development overall. Mojo’s Wishlist for MLIR 2 .0 – by Jeff Niu (Mojo) Mojo is a systems programming language built natively on top of MLIR and leverages MLIR to build state-of-the-art compiler technology. Mojo is the foundation of Modular’s heterogeneous compute platform, enabling performance portability across different hardware and application domains. After 2 years of building Mojo with MLIR , design misalignments between the compiler infrastructure and the desired language semantics have clearly emerged. This talk will delve into what an ideal MLIR 2 .0 would look like purely in the context of the design of Mojo: first-class dependent types, unified types and attributes, control flow, etc. We will also explore our challenges scaling MLIR compilation to the massive amounts of code backing LLMs and our experience building a multithreaded compiler. More on: https://grosser.science/compiler-social-2024-09-03/ This talk is part of the lc985's list series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsRoyal Institute of Philosophy - Annual Public Lectures - Anglia Ruskin University Type the title of a new list here Daily Local NewsOther talksBSU Seminar: "Bayesian latent multi-state modelling for longitudinal health trajectories" What can butterfly hybrid zones tell us about the genomic architecture of species barriers? Adventures in Specification Based Testing On the (Local) Lifting Property 2024 Milstein Lecture: Centrioles in homeostasis, disease and evolution: tiny organelles, multiple and critical functions - IN PERSON ONLY |