COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > Microsoft Research Cambridge, public talks > Platforms and Applications for "Big and Fast" Data Analytics
Platforms and Applications for "Big and Fast" Data AnalyticsAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Microsoft Research Cambridge Talks Admins. This event may be recorded and made available internally or externally via http://research.microsoft.com. Microsoft will own the copyright of any recordings made. If you do not wish to have your image/voice recorded please consider this before attending Recently there has been a significant interest in building big data systems that can handle not only “big data” but also “fast data” for analytics. Our work is strongly motivated by recent real-world case studies that point to the need for a general, unified data processing framework to support analytical queries with different latency requirements. Towards this goal, our project is designed to transform the popular MapReduce computation model, originally proposed for batch processing, into distributed (near) real-time processing. In this talk, I start by examining the widely used Hadoop system and presenting a thorough analysis to understand the causes of high latency in Hadoop. I then present a number of necessary architectural changes, as well as new resource configuration and optimization techniques to meet user-specified latency requirements while maximizing throughput. Experiments using typical workloads in click stream analysis and twitter feed analysis show that our techniques reduce the latency from tens or hundreds of seconds in Hadoop to sub-second in our system, with 2x-7x increase in throughput. Our system also outperforms state-of-the-art distributed stream systems, Twitter Storm and Spark Streaming, by a wide margin. Finally, I will show some initial results and challenges of supporting big and fast data analytics in the emerging domain of genomics. This talk is part of the Microsoft Research Cambridge, public talks series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsThe Annual CCHSR Lecture 2016 Centre for Health Leadership and Enterprise MAGDALENE FESTIVAL OF SOUND ESRCDTC Annual Lecture Special Lecture Changing HealthOther talksNational crises, viewed in the light of personal crises Handbuchwissenschaft, or: how big books maintain knowledge in the twentieth-century life sciences Auxin, glucosinolates, and drought tolerance: What's the connection? Magnetic microscopy of meteorites: probing the magnetic state of the early solar system UK 7T travelling-head study: pilot results |