University of Cambridge > Talks.cam > Computer Laboratory Systems Research Group Seminar > Examining Raft's behaviour during partial network failures

Log in

University Account

External (via Google)

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Examining Raft's behaviour during partial network failures

Download to your calendar using vCal

If you have a question about this talk, please contact Srinivasan Keshav .

State machine replication protocols such as Raft are widely used to build highly-available strongly-consistent services, maintaining liveness even if a minority of servers crash. As these systems are implemented and optimised for production, they accumulate many divergences from the original specification. These divergences are poorly documented, resulting in operators having an incomplete model of the system’s characteristics, especially during failures. In this paper, we look at one such Raft model used to explain the November Cloudflare outage and show that etcd’s behaviour during the same failure differs. We continue to show the specific optimisations in etcd causing this difference and present a more complete model of the outage based on etcd’s behaviour in an emulated deployment using reckon. Finally, we highlight the upcoming PreVote optimisation in etcd, which might have prevented the outage from happening in the first place.

Bio:

Chris Jensen is a first year PhD student in the SRG , focusing on benchmarking and improving the availability of strongly consistent distributed databases. He previously completed his BSc in Computer Science at the University of Cambridge.

This talk is part of the Computer Laboratory Systems Research Group Seminar series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Examining Raft's behaviour during partial network failures

📅 Download to calendar (vCal)

👤 Speaker: Chris J. Jensen, Computer Lab
📅 Date & Time: Thursday 29 April 2021, 15:00 - 16:00
📍 Venue: https://meet.google.com/ehj-dwaz-rea

Questions? Contact Srinivasan Keshav

Abstract

Bio:

Series This talk is part of the Computer Laboratory Systems Research Group Seminar series.

Included in Lists

Note: Ex-directory lists are not shown.

Log in

🔐 Log In

Information on

ℹ️ Information

Examining Raft's behaviour during partial network failures

This talk is included in these lists:

Examining Raft's behaviour during partial network failures

Abstract

Included in Lists

Log in

🔐 Log In

Information on

ℹ️ Information

Examining Raft's behaviour during partial network failures

This talk is included in these lists:

Other lists

Other talks

Examining Raft's behaviour during partial network failures

Abstract

Included in Lists