COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > Computer Laboratory Systems Research Group Seminar > Making the Most of Massive Clusters
Making the Most of Massive ClustersAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Srinivasan Keshav. Resource management systems play an important role in today’s large clusters, allocating jobs/containers to compute resources while balancing metrics like fairness, efficiency, and fault tolerance. Existing management policies in systems such as Kubernetes, VMWare’s DRS , and Red Hat’s OpenShift rely on heuristic-based schedulers which often scale well but are typically sub-optimal. This problem is made worse by the growing trend of heterogeneous clusters—composed of a mix of several generations of CPUs, GPUs, etc. —where existing heuristics perform poorly. This talk will emphasize the environmental footprint of large resource clusters as a key motivation. I’ll first describe our work on allocating ML training jobs in heterogeneous clusters. A key insight is that many popular scheduling objectives can be cast as mathematical optimization problems whose solutions can maximize cluster efficiency; other systems take a similar approach, for example TetriSched and Facebook’s RAS . However, optimization-based techniques are notorious for scaling poorly to massive systems. To address this issue, I will describe POP : a technique to partition the problem and quickly approximate the optimal allocation. POP reduces solve times by several orders of magnitude with minimal performance loss across a wide range of problem domains, including cluster scheduling and network traffic engineering. Bio: Fiodar is currently a postdoc fellow at the Stanford Future Data Systems lab, working with Matei Zaharia and Peter Bailis. His research interests span ML systems, energy systems, and data science, with a focus on finding practical solutions to fundamental problems. He obtained his PhD from the University of Waterloo, where his thesis on the optimization of solar panel and battery systems was recognized through the Cheriton Distinguished Dissertation award. This talk is part of the Computer Laboratory Systems Research Group Seminar series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsMEMS seminar Babbage Seminar Series Physical ChemistryOther talksEvolution and development of vertebral regionalization in fishes Honorary Fellows Lecture - Should we trust statistics? Decoupling Photoprotective Roles of Carotenoids in the PSII Light Harvesting Complexes for Improved Plant Growth An introduction to the DVRG Network science and network medicine: New strategies for understanding and treating the biological basis of mental ill-health The flow and rheology of graphene nanoparticles |