COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > Microsoft Research Cambridge, public talks > Big Data Analytics with All-or-Nothing Parallel Jobs
Big Data Analytics with All-or-Nothing Parallel JobsAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Microsoft Research Cambridge Talks Admins. This event may be recorded and made available internally or externally via http://research.microsoft.com. Microsoft will own the copyright of any recordings made. If you do not wish to have your image/voice recorded please consider this before attending Extensive data analysis has become the enabler for diagnostics and decision making in many modern systems. These analyses have both competitive as well as social benefits. To cope with the deluge in data that is growing faster than Moore’s law, computation frameworks have resorted to massive parallelization of analytics jobs into many fine-grained tasks. These frameworks promised to provide efficient and fault-tolerant execution of these tasks. However, meeting this promise in clusters spanning hundreds of thousands of machines is challenging and a key departure from earlier work on parallel computing. A simple but key aspect of parallel jobs is the all-or-nothing property: unless all tasks of a job are provided equal improvement, there is no speedup in the completion of the job. This talk will demonstrate how the all-or-nothing property impacts replacement algorithms in distributed caches for parallel jobs. Our coordinated caching system, PAC Man, makes global caching decisions and employs a provably optimal cache replacement algorithm. A highlight of our evaluation using workloads from Facebook and Bing datacenters is that PAC Man’s replacement algorithm outperforms even Belady’s MIN (that uses an oracle) in speeding up jobs. Along the way, I will also describe how we broke the myth of disk-locality’s importance in datacenter computing and solutions to mitigate straggler tasks. This talk is part of the Microsoft Research Cambridge, public talks series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsCambridge Interdisciplinary Reproduction Forum Cancer Research tate modern Wedding invitation Semantics and Pragmatics Research Group Cambridge Image Analysis SeminarsOther talksImmigration and Freedom Language Adaptation experiments: Cross-lingual embeddings for related languages TBA Prof Chris Rapley (UCL): Polar Climates TODAY Foster Talk - Localised RNA-based mechanisms underlie neuronal wiring Epigenetics: One Genome, Multiple Phenotypes Black and British Migration Singularities of Hermitian-Yang-Mills connections and the Harder-Narasimhan-Seshadri filtration Picturing the Heart in 2020 Immigration policy-making beyond 'Western liberal democracies' |