COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > Computer Laboratory Systems Research Group Seminar > Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management
Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State ManagementAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Eiko Yoneki. As users of “big data” applications expect fresh results, we witness a new breed of stream processing systems (SPS) that are designed to scale to large numbers of cloud-hosted machines. Such systems face new challenges: (i) to benefit from the “pay-as-you-go” model of cloud computing, they must scale out on demand, acquiring additional virtual machines (VMs) and parallelising operators when the workload increases; (ii) failures are common with deployments on hundreds of VMs—systems must be fault-tolerant with fast recovery times, yet low per-machine overheads. An open question is how to achieve these two goals when stream queries include stateful operators, which must be scaled out and recovered without affecting query results. Our key idea is to expose internal operator state explicitly to the SPS through a set of state management primi- tives. Based on them, we describe an integrated approach for dynamic scale out and recovery of stateful operators. Externalised operator state is checkpointed periodically by the SPS and backed up to upstream VMs. The SPS identifies individual operator bottlenecks and automatically scales them out by allocating new VMs and partitioning the checkpointed state. At any point, failed operators are recovered by restoring checkpointed state on a new VM and replaying unprocessed tuples. We evaluate this approach with the Linear Road Benchmark on the Amazon EC2 cloud platform and show that it can scale automatically to a load factor of L=350 with 50 VMs, while recovering quickly from failures. Joint work with Raul Castro Fernandez, Matteo Migliavacca and Peter Pietzuch. Bio: Eva Kalyvianaki is a lecturer in City University London in the Department of Computer Science. She holds a PhD from Cambridge University and MSc and BSc degrees from the University of Crete, Greece. Before joining City University she was a post-doctoral researcher in Imperial College London. Her research interests span the areas of cloud computing, real-time query processing, autonomic computing and systems and performance in general. This talk is part of the Computer Laboratory Systems Research Group Seminar series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsLand Economy ECNM Group, Department of Materials Science and Metallurgy Reproduction on Film: Sex, Secrets and Lies Young Nanoscientist India Award Winner's Lecture sponsored by Oxford Instruments Cambridge Seminars in Disease Mechanisms Engineering Safe AI seminar groupOther talksAlzheimer's talks Using single-cell technologies and planarians to study stem cells, their differentiation and their evolution Embedding Musical Codes into an Interactive Piano Composition Locomotion in extinct giant kangaroos? Hopping for resolution. Introduction to early detection and tumour development Value generalization during human avoidance learning Graded linearisations for linear algebraic group actions A feast of languages: multilingualism in neuro-typical and atypical populations Art and Migration The Productivity Paradox: are we too busy to get anything done? Cyclic Peptides: Building Blocks for Supramolecular Designs Oncological imaging: introduction and non-radionuclide techniques |