University of Cambridge > Talks.cam > Computer Laboratory Systems Research Group Seminar > The Case for Decentralized Scheduling in Modern Datacenters

The Case for Decentralized Scheduling in Modern Datacenters

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Richard Mortier.

Join on MS Teams

The growing demand for data centre resources and the slower evolution of their hardware have led to clusters operating at high utilisation. In this talk, I will examine how current schedulers perform under such conditions. I will discuss how centralised schedulers struggle to scale under high load due to the significant network traffic caused by continuously transferring up-to-date node data. Conversely, distributed schedulers scale well but lack a global cluster view, leading to suboptimal task allocations. As a result, existing schedulers impose up to three times longer wait times on tail tasks, which increases job completion times.

I will then introduce our work on decentralised scheduling, focusing on performance, scalability, and load balancing. These schedulers have been largely under-explored due to their design complexity. However, we demonstrate that Murmuration, our job-aware decentralised scheduler, achieves high performance under both normal and high load despite its simple approach using approximate load information. It reduces communication overhead between nodes and schedulers while still achieving balanced cluster load distribution. By the end of this talk, I hope to convince you that decentralised schedulers with approximate knowledge strike the right balance between performance and scalability, making them a practical solution for today’s highly utilised data centres.

Bio: Smita Vijayakumar recently completed her PhD from the Department of Computer Science and Technology at the University of Cambridge, under the supervision of Evangelia Kalyvianaki. As a part of her thesis, she developed a novel decentralised scheduling framework to reduce tail task latencies in highly utilised data centres. She has over twelve years of industry experience working on networking, cloud computing, and distributed systems. She also has an MS from The Ohio State University, where her work investigated cloud resource allocation to bottleneck stages for processing streaming applications. Her research has been published in top-tier conferences, and also as a book. She has also been actively involved in mentoring, teaching, and community leadership, including founding Women Who Go, India.

This talk is part of the Computer Laboratory Systems Research Group Seminar series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2025 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity