University of Cambridge > > Computer Laboratory Systems Research Group Seminar > Firmament: fast, centralized cluster scheduling at scale

Firmament: fast, centralized cluster scheduling at scale

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Liang Wang.

Scheduling tasks on “warehouse-scale” clusters is a challenging undertaking: thousands of tasks must cleverly and rapidly be placed in order to achieve high utilization and good application-level performance. Centralized data-center schedulers make high-quality placement decisions, but they come at the cost of high decision latency at scale, which degrades response time for interactive jobs. Distributed schedulers, by contrast, make low-latency decisions at scale, but are restricted to simple algorithms and can make poor decisions as a result.

In this talk, I present Firmament, a centralized scheduler that scales to tens of thousands of machines at sub-second latency, even though it performs a computationally expensive min cost flow optimization. To achieve this, Firmament automatically chooses between different min-cost flow algorithms, solves the optimization problem incrementally when possible, and applies problem-specific optimizations to common min-cost flow algorithms.

Our experiments with a Google workload trace from a 12,500-machine cluster show that Firmament places tasks in hundreds of milliseconds, improving decision latency by more than 10x over Quincy, a similar prior scheduler. Moreover, since Firmament can efficiently solve large optimization problems, it supports novel scheduler features that would otherwise be too expensive. Firmament outperforms state-of-the-art distributed schedulers in placement quality without an increase in scheduling decision latency for common industry workloads.

This talk is part of the Computer Laboratory Systems Research Group Seminar series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2024, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity