On verifying fault tolerance of distributed protocols

Dana Fisman*, Orna Kupferman, Yoad Lustig

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

26 Scopus citations

Abstract

Distributed systems are composed of processes connected in some network. Distributed systems may suffer from faults: processes may stop, be interrupted, or be maliciously attacked. Fault-tolerant protocols are designed to be resistant to faults. Proving the resistance of protocols to faults is a very challenging problem, as it combines the parameterized setting that distributed systems are based-on, with the need to consider a hostile environment that produces the faults. Considering all the possible fault scenarios for a protocol is very difficult. Thus, reasoning about fault-tolerance protocols utterly needs formal methods. In this paper we describe a framework for verifying the fault tolerance of (synchronous or asynchronous) distributed protocols. In addition to the description of the protocol and the desired behavior, the user provides the fault type (e.g., fail-stop, Byzantine) and its distribution (e.g., at most half of the processes are faulty). Our framework is based on augmenting the description of the configurations of the system by a mask describing which processes are faulty. We focus on regular model checking and show how it is possible to compile the input for the model-checking problem to one that takes the faults and their distribution into an account, and perform regular model-checking on the compiled input. We demonstrate the effectiveness of our framework and argue for its generality.

Original languageEnglish
Title of host publicationTools and Algorithms for the Construction and Analysis of Systems - 14th Int. Conf., TACAS 2008 - Held as Part of the Joint European Conf. Theory and Practice of Software, ETAPS 2008 Proceedings
Pages315-331
Number of pages17
DOIs
StatePublished - 2008
Event"14th International Conference onTools and Algorithms for the Construction and Analysis of Systems, TACAS2008" - Budapest, Hungary
Duration: 29 Mar 20086 Apr 2008

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4963 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference"14th International Conference onTools and Algorithms for the Construction and Analysis of Systems, TACAS2008"
Country/TerritoryHungary
CityBudapest
Period29/03/086/04/08

Fingerprint

Dive into the research topics of 'On verifying fault tolerance of distributed protocols'. Together they form a unique fingerprint.

Cite this