Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Cambridge Compiler Social Talks

Add to your list(s) Download to your calendar using vCal

Markus Böck, Jeff Niu
Tuesday 03 September 2024, 15:00-16:00
Computer Laboratory, William Gates Building, LT1.

If you have a question about this talk, please contact Luisa Cicolini.

Within the next Compiler Social we will host two talks:

Quidditch: An end-to-end deep learning compiler for highly-concurrent accelerators with software-managed caches – by Markus Boeck (University of Cambridge)

The wide adoption of Deep Neural Networks and the resulting desire for more hardware resources has fueled the rapid development of innovative custom hardware accelerators that are increasingly difficult to program. Many proposed hardware designs are only evaluated with hand-written micro-kernels, and the few evaluated on entire neural networks typically require significant investments in building the necessary software stacks. Highly sophisticated neural network compilers emerged to generate DNNs out of expert-written microkernels, but they were traditionally hand-crafted for each platform, which prevented both scaling and synergy with industry-supported compilation flows. We present Quidditch, a novel neural network compiler and runtime, that provides an end-to-end workflow from a high-level network description to high-performance code running on ETH Occamy, one of the first chiplet-based AI research hardware accelerators. Quidditch builds on IREE , an industry-strength AI compiler and imports NNs from PyTorch, JAX , and Tensorflow and offers optimisations such as fusion, scheduling, buffer allocation, memory and multi-level concurrency-guided tiling and asynchronous memory transfers to scratchpads. We present a set of preliminary novel optimisations, SSA -based double-buffering and barrier management for scratchpads, and redundant transfer elimination tailored for explicitly managed memory. We pair this with a high-performance microkernel generator, which enables us to run full DNNs with full FPU occupancy and a more than 20x speed-up over IREE ’s generic LLVM backend on our custom hardware accelerator. By providing key building blocks for scaling AI accelerator compilation to full neural networks, we aim to accelerate the evaluation of custom AI hardware and, as a result, AI hardware development overall.

Mojo’s Wishlist for MLIR 2 .0 – by Jeff Niu (Mojo)

Mojo is a systems programming language built natively on top of MLIR and leverages MLIR to build state-of-the-art compiler technology. Mojo is the foundation of Modular’s heterogeneous compute platform, enabling performance portability across different hardware and application domains. After 2 years of building Mojo with MLIR , design misalignments between the compiler infrastructure and the desired language semantics have clearly emerged. This talk will delve into what an ideal MLIR 2 .0 would look like purely in the context of the design of Mojo: first-class dependent types, unified types and attributes, control flow, etc. We will also explore our challenges scaling MLIR compilation to the massive amounts of code backing LLMs and our experience building a multithreaded compiler.

More on: https://grosser.science/compiler-social-2024-09-03/

This talk is part of the lc985's list series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Cambridge Compiler Social Talks

This talk is included in these lists:

Other lists

Other talks