University of Cambridge > Talks.cam > Computer Laboratory Systems Research Group Seminar > The Case for Decentralized Scheduling in Modern Datacenters

The Case for Decentralized Scheduling in Modern Datacenters

Add to your list(s) Download to your calendar using vCal

  • UserSmita Vijayakumar, Systems Research Group, Cambridge University Computer Laboratory
  • ClockThursday 01 May 2025, 15:00-16:00
  • HouseFW11.

If you have a question about this talk, please contact Richard Mortier.

Modern data centres serve as a backbone for executing diverse user workloads. The growing demand for their resources has led to high volumes of traffic, requiring clusters to operate at high utilization. In this talk, I shall detail how data centre schedulers, which are responsible for mapping workload tasks to resources, perform under such challenging conditions. I will present how centralized schedulers, while globally informed, do not scale well under high load since they generate a lot of network traffic when continuously transferring updated node data. Conversely, distributed schedulers scale well but lack a precise global view of cluster resources, leading to suboptimal task allocations. Consequently, these existing schedulers impose up to three times longer wait times on tail tasks, leading to large variance in inter-task start times, and hence, longer task and job completion times.

I will then describe recent advances in decentralized scheduling, focusing on performance, scalability, and load balancing. I will present our approach of job-aware decentralized scheduling which effectively reduces task wait times even under high cluster load. I will also talk about how distributed optimization algorithms can be implemented within the framework of decentralized scheduling, in order to provide theoretical guarantees for convergence to an optimal schedule. By the end of this talk, I hope to convince you that decentralized schedulers achieve a good balance in both scale and performance, and are indeed the most practical solution for data centres.

Bio: Smita Vijayakumar recently completed her PhD in Computer Science from the University of Cambridge, under the supervision of Evangelia Kalyvianaki. As a part of her thesis, she developed a decentralized scheduling framework to reduce tail task latencies in highly utilized datacenters. She has over twelve years of industry experience at companies like Cisco and Juniper, working on cloud computing, networking, and distributed systems. She also has an MS in Computer Science from The Ohio State University, where her work investigated cloud resource allocation to bottleneck stages for processing streaming applications. Her research has been published in top-tier ACM and IEEE conferences. She has also been actively involved in mentoring, teaching, and community leadership, including founding Women Who Go in India. Smita’s expertise spans cloud scheduling, resource management, and scalable distributed systems.

This talk is part of the Computer Laboratory Systems Research Group Seminar series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2025 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity