University of Cambridge > > Computer Laboratory Systems Research Group Seminar > Scaling AI Systems with Optical I/O

Scaling AI Systems with Optical I/O

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Srinivasan Keshav.

The emergence of optical I/O chiplets enables compute/memory chips to communicate with several Tbps bandwidth. Many technology trends point to the arrival of optical I/O chiplets as a key industry inflection point to realize fully disaggregated systems. In this talk, I will focus on the potential of optical I/O-enabled accelerators for building high bandwidth interconnects tailored for distributed machine learning training. Our goal is to scale the state-of-the-art ML training platforms, such as NVIDIA ’s DGX , from a few tightly connected GPUs in one package to hundreds of GPUs while maintaining Tbps communication bandwidth across the chips. Our design enables accelerating the training time of popular ML models using a device placement algorithm that partitions the training job with data, model, and pipeline parallelism across nodes, while ensuring a sparse and local communication pattern that can be supported efficiently on the interconnect.

Bio: Manya Ghobadi is an assistant professor at the EECS department at MIT . Before MIT , she was a researcher at Microsoft Research and a software engineer at Google Platforms. Manya is a computer systems researcher with a networking focus and has worked on a broad set of topics, including data center networking, optical networks, transport protocols, and network measurement. Her work has won the best dataset award and best paper award at the ACM Internet Measurement Conference (IMC) as well as Google research excellent paper award.

This talk is part of the Computer Laboratory Systems Research Group Seminar series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2024, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity