In this paper we propose a methodology for verifying the sequential
consistency of caching algorithms.
The scheme combines timestamping and an auxiliary history table
to construct a serial execution `matching' any given execution of the
algorithm.
We believe that this approach is applicable to an interesting class of
sequentially consistent algorithms in which the buffering of cache updates
allows stale values to be read from cache.
We illustrate this methodology
by verifying the high level specifications of the lazy caching and ring
algorithms.
Proc. 13th Conference on Computer Aided Verification
PVS dump files
We provide the PVS dump files for the two caching algorithms.
The verification uses PVS2.3 with no patches and can be undumped using
the PVS undump menu option.
In both cases the top level file is the files refine.pvs.
This file imports, directly or otherwise, all the other files.
The serial system is serial.pvs (and it is identical in both proofs);
the lazy caching system is lazy.pvs and the ring algorithm is
ring.pvs. The files have been commented to explain the data structures
etc, but to understand the algorithms it is suggested that one refer
to the explanations below, and to the papers cited there.
Please refer any queries to Tamarah.
| Lazy caching algorithm | Ring algorithm |
The ``lazy cache algorithm'' [ABM93] is a sequentially consistent protocol in which cache updates can be postponed, and writes are buffered, allowing processors to access stale cache data.
The system consists of n processors, P_1, ..., P_n with each P_i owning a cache C_i, and an in- and out-queue In_i and Out_i, respectively. Both the in- and out-queues are FIFO. We have further associated with each processor an instruction list, an unbounded list of memory instructions to be executed in the list order, and a program counter, pc_i, which points to the next instruction. The instruction in the list are of the form ``read a'' or ``write a, d''.
A processor P_i initiates a write event w_i by placing a record recording the instruction address and new value at the tail of Out_i. When this record reaches the top of Out_i it can be popped off and the memory write mw_i occurs. That is, the shared memory is updated, and a new record recording the address and value is placed in the in-queue In_j of all processors P_j. The copy placed in In_i is starred. When the entry at the head of In_i is popped off a cache update \lcu_i occurs, and C_i is updated with the value recorded in the In_i entry.
A read event r_i can be performed if the address a requested is in the cache, Out_i is empty and In_i does not contain any starred entries. The value read is that in the cache. We note that this value may differ from that in the memory if a write to a is buffered in In_i. Locations (which are not currently in cache) can be brought into the cache by placing the memory value in the in queue in a memory read (mr_i) action, and can be evicted by cache invalidation.
In our interleaving model at any step a processor can either initiate a read or write (if one is enabled), pop an entry off its in- or out-queue if they are non-empty, initiate a cache update, invalidate a cache entry, or idle. The system is parameterized by the number of processors and there is no restriction on the maximum size of the queues, the address space, or the set of memory values. Our model very closely resembles that of Gerth [Ger99]. The reader is referred to this paper, or our PVS source files, for more information.
References
[ABM93] Y. Afek, G.Brown, and M. Merrit. Lazy caching. ACM Transactions
on Programming Languages and Systems, 15(1):182--205, January 1993.
[Ger99] R. Gerth. Sequential consistency and the lazy caching algorithm. Dist. Comp., 12:57--59, June 1999.
In order to test the applicability of our methodology we applied it also to a model based on Collier's ring algorithm [Col92]:
Processors P_0,..., P_n-1 are connected in a ring, with P_i sending messages only to its successor, P_{i+1 mod n}. The channels between every two successive processors are FIFO queues of messages. Processor P_0 is designated the supervisor. If processor P_i, i \neq 0 wants to perform a write of value v to address a it sends to its successor a WriteRequest message, for a and v and enters a waiting state. This WriteRequest is passed around the ring until it reaches the supervisor. The supervisor updates memory with this address and value, and then sends a WriteReturn message with the address and value. On receiving a WriteReturn message all processors update their caches, and then pass it on to their successor. Process P_i, on receiving the WriteReturn, releases itself from its waiting state and can proceed. When the WriteReturn reaches the supervisor, it is removed from the system.
A processor can execute a read instruction if the address is in its cache. Otherwise it sends a ReadRequest, which the supervisor answers with a ReadReturn. After thus bringing the address into the cache, the read can be executed.
The supervisor has direct access to the memory (it has no cache) and never issues ReadRequest or WriteRequest messages. When it performs a write it sends a WriteReturn message so that all other caches can be updated. This model fits neatly into our framework. As in the lazy caching example, cache reads and updates to the shared memory are entered into the history table when they occur. (In this algorithm the memory update occurs when a WriteReturn in initiated by the supervisor.) The supervisor increments its local clock when it sends a WriteReturn, and all other processors increment their local clocks on receiving the WriteReturn. The local time of the supervisor is the global system time. The local time t_i of P_i is the global time minus the number of writeReturns on channels between P_0 and P_i.
It is easy to see that the ring algorithm is not coherent: Consider the scenario in which processor P_i issues a WriteRequest for the address a and value v_i. The WriteRequest reaches the supervisor, which updates memory, and a WriteReturn is sent through the ring. While the WriteReturn is on a channel between P_i and P_i+j, P_i+j reads a stale value v for a from its cache.
Reference
[Col92] W. W. Collier. Reasoning about Parallel Architectures. Prentice
Hall, 1992.