Dynamic verification enables a system to improve its availability by checking that its execution is correct as it is running. While computing with high performance and low power is desirable, computing correctly-despite hardware faults and subtle design bugs-is most important. In the context of multithreaded systems, including simultaneous multithreading and chip multiprocessors, memory system correctness is defined by the architecture's memory consistency model. Memory consistency models, such as sequential consistency, specify the legal orderings of loads and stores across the threads that access shared memory.
We present the first implementable design for dynamic verification of sequential consistency (DVSC) in multithreaded systems. DVSC is impossible in the theoretical world of systems without resource constraints, since a verifier must consider the entire execution before determining if it is correct. However, we show how to leverage resource constraints to verify only a small sliding window of execution. We convert the problem into an engineering tradeoff in which we use storage and bandwidth to make the probability of false verification arbitrarily small. We discuss implementation challenges, including how to interface DVSC to a backward error recovery mechanism.
We use full-system simulation and commercial workloads to evaluate our
first implementation of DVSC, called Sherlock. We demonstrate that
Sherlock, combined with backward error recovery, improves system
availability by recovering from injected errors. While Sherlock adds only
negligible runtime overhead, it does consume additional interconnect
bandwidth (30-120% for commercial workloads) and storage for performing
DVSC.