University of Cambridge > Talks.cam > Microsoft Research Cambridge, public talks > Platforms and Applications for "Big and Fast" Data Analytics

Platforms and Applications for "Big and Fast" Data Analytics

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Microsoft Research Cambridge Talks Admins.

This event may be recorded and made available internally or externally via http://research.microsoft.com. Microsoft will own the copyright of any recordings made. If you do not wish to have your image/voice recorded please consider this before attending

Recently there has been a significant interest in building big data systems that can handle not only “big data” but also “fast data” for analytics. Our work is strongly motivated by recent real-world case studies that point to the need for a general, unified data processing framework to support analytical queries with different latency requirements. Towards this goal, our project is designed to transform the popular MapReduce computation model, originally proposed for batch processing, into distributed (near) real-time processing.

In this talk, I start by examining the widely used Hadoop system and presenting a thorough analysis to understand the causes of high latency in Hadoop. I then present a number of necessary architectural changes, as well as new resource configuration and optimization techniques to meet user-specified latency requirements while maximizing throughput. Experiments using typical workloads in click stream analysis and twitter feed analysis show that our techniques reduce the latency from tens or hundreds of seconds in Hadoop to sub-second in our system, with 2x-7x increase in throughput. Our system also outperforms state-of-the-art distributed stream systems, Twitter Storm and Spark Streaming, by a wide margin. Finally, I will show some initial results and challenges of supporting big and fast data analytics in the emerging domain of genomics.

This talk is part of the Microsoft Research Cambridge, public talks series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity