Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Scaling AI Systems with Optical I/O

Add to your list(s) Download to your calendar using vCal

Manya Ghobadi, MIT
Thursday 10 September 2020, 15:00-16:00
https://meet.google.com/ehj-dwaz-rea.

If you have a question about this talk, please contact Srinivasan Keshav.

The emergence of optical I/O chiplets enables compute/memory chips to communicate with several Tbps bandwidth. Many technology trends point to the arrival of optical I/O chiplets as a key industry inflection point to realize fully disaggregated systems. In this talk, I will focus on the potential of optical I/O-enabled accelerators for building high bandwidth interconnects tailored for distributed machine learning training. Our goal is to scale the state-of-the-art ML training platforms, such as NVIDIA ’s DGX , from a few tightly connected GPUs in one package to hundreds of GPUs while maintaining Tbps communication bandwidth across the chips. Our design enables accelerating the training time of popular ML models using a device placement algorithm that partitions the training job with data, model, and pipeline parallelism across nodes, while ensuring a sparse and local communication pattern that can be supported efficiently on the interconnect.

Bio: Manya Ghobadi is an assistant professor at the EECS department at MIT . Before MIT , she was a researcher at Microsoft Research and a software engineer at Google Platforms. Manya is a computer systems researcher with a networking focus and has worked on a broad set of topics, including data center networking, optical networks, transport protocols, and network measurement. Her work has won the best dataset award and best paper award at the ACM Internet Measurement Conference (IMC) as well as Google research excellent paper award.

This talk is part of the Computer Laboratory Systems Research Group Seminar series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Scaling AI Systems with Optical I/O

This talk is included in these lists:

Other lists

Other talks