COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > Computer Laboratory Systems Research Group Seminar > Towards Grey Fault Tolerant Cloud Systems
Towards Grey Fault Tolerant Cloud SystemsAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Amjad. Building robust, large-scale distributed systems is notoriously challenging. Decades of research have made significant advances in tackling this challenge with mature techniques such as state-machine replication. These techniques usually assume a fail-stop model. Ample real-world evidence, however, suggests that faults in modern cloud infrastructure are often “grey”, in which a component is severely impaired but still appears to be working. These grey failures cannot be effectively detected or handled by existing solutions. In this talk, I will discuss the grey failure problem. Using real-world examples, we argue that a key trait of the subtle grey failure mode is a form of differential observability. Based on this insight, I will present Panorama, a solution that harnesses observability in large systems to detect grey failures by using instrumentation to convert any system component into an in-situ observer. To further enhance the inherent system observability, I will propose an intrinsic software watchdog abstraction and a tool called OmegaGen that automatically generates customized watchdogs for a given program by using a program reduction technique. I will conclude by outlining some open challenges in making cloud systems grey-fault-tolerant. Bio: Ryan Huang is an Assistant Professor in the Department of Computer Science at Johns Hopkins University. He leads the Ordered Systems Lab at JHU , which conducts research broadly in distributed systems, operating systems, cloud and mobile computing. His work received the best paper award at OSDI 2016 , ASPLOS 2019, NSDI 2020 , and the best paper award nominee at MICRO 2018 . He is a recipient of the NSF CAREER Award (2020). Dr. Huang received a B.S. degree in Computer Science (Economics minor) from Peking University (2010), a P.h.D degree from UC San Diego (2016). This talk is part of the Computer Laboratory Systems Research Group Seminar series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsExperience Islam Week 2011 (12th February - 20th February) Machine Learning @ CUED Betty & Gordon Moore Library EventsOther talksBrake-actuated steering of heavy good vehicles Development of virtual heart for the study of cardiac arrhythmias Blood Sculptures Quality of life and mental health in autism Secondmind's research activities to make Gaussian Processes industry proof |