We present several new criteria for the evaluation of communication protocols and, in particular, protocols resilient to faults. These criteria are PRACTICAL, in the sense that they give precise meaning to intuitive, informal considerations which are currently used to evaluate and compare some practical aspects of communication protocols. The new criteria enable comparative evaluation of communication protocols with respect to these practical aspects.
We demonstrate the new criteria by analyzing some of the fundamental tasks used in practical communication networks. We present new efficient and fault tolerant solutions to the following tasks: broadcast, end to end communication, queries and network synchronization. We believe that these solutions have substantial practical advantages over known solutions and, in particular, over the known solutions which have been formally analyzed. These advantages are reflected by our criteria.
In this work we concentrate on fault tolerance issues. Most works dealing with fault tolerant protocols require some 'reliability', in order to avoid banal impossibilities due to extreme unreliability. It is important to use realistic reliability requirements so that there would be correspondence between methods that work well in practice and solutions judged fault tolerant and efficient by formal analysis. However, the reliability requirements used in existing formal works are unrealistically weak and hence rule out practical solutions. We propose a new reliability requirement which seems more realistic. Another advantage of our reliability requirement is that it is QUANTITATIVE, i.e. allows comparison of the reliability requirements of different solutions.
To compare our quantitative reliability requirements to the existing (qualitative) approaches, we consider the BROADCAST task, used extensively in actual networks. The existing formulations of broadcast seem too difficult; for example, protocols with bounded storage must have unbounded time complexity. Our quantitative formulation of broadcast seems to capture the intuitive requirements and escapes such difficulties.
We analyze both fail-stop failures and 'malicious' faults. To deal with malicious faults, we use a bound for the message transmission delay. In practical networks, the bound is much higher than the typical message transmission delay. However, this is ignored by the usual time complexity analysis. We propose to compute the time complexity as a function of both the actual delay and the bound. This gives an indication of the time required by a protocol under the 'typical' conditions, and not only under 'worst case' conditions.
To illustrate our measure of time complexity, we use this analysis for two problems: performing distributed queries and detecting message forwarding faults. For both problems, we present protocols which are time optimal for any values of the actual delay and the bound on the delay. We call such solutions EARLY TERMINATING.
We also present refined efficiency criteria for interactive protocols. In an interactive protocol, such as an end to end protocol, there is a sequence of input and output events. A natural technique is to perform some actions only per batches of several input actions. This may achieve better complexity for the protocol, when amortized over the number of inputs events. However, this may be at the cost of increasing the complexity of the protocol during short transient periods. Previous measures either ignored the effect of the transient periods or ignored the amortized complexity. We suggest a new measure, which reveals both aspects.
We demonstrate our complexity measures for interactive protocols by analyzing our broadcast protocol. In particular, we analyze the throughput of this protocol. This analysis shows how the addition of a window mechanism improves the throughput, as expected from practice.
Submitted to the Senate of the Technion, March 1991.
Back to Oded Goldreich's homepage.