Full text (gzip-compressed postscript)
We provide a competitive analysis framework for online prefetching and buffer management algorithms in parallel I/O systems, using a read-once model of block references. This has widespread applicability to key I/O-bound applications such as external merging and concurrent playback of multiple video streams. Two realistic lookahead models, global lookahead and local lookahead, are defined. Algorithms NOM and GREED based on these two forms of lookahead are analyzed for shared buffer and distributed buffer configurations, both of which occur frequently in existing systems. An important aspect of our work is that we show how to implement both the models of lookahead in practice using the simple techniques of forecasting and flushing.
Given a D-disk parallel I/O system and a globally shared I/O buffer
that can hold up to M disk blocks, we derive a lower bound of
on the competitive ratio of any deterministic
online prefetching algorithm with O(M) lookahead. NOM is shown to match
the lower bound using global M-block lookahead. In contrast, using only
local lookahead results in an
competitive ratio. When the
buffer is distributed into D portions of M/D blocks each, the algorithm
GREED based on local lookahead is shown to be optimal, and NOM is within a
constant factor of optimal. Thus we provide a theoretical basis for the
intuition that global lookahead is more valuable for prefetching in the
case of a shared buffer configuration whereas it is enough to provide local
lookahead in case of the distributed configuration. Finally, we analyze the
performance of these algorithms for reference strings generated by a
uniformly-random stochastic process and we show that they achieve the
minimal expected number of I/Os. These results also give bounds on the
worst-case expected performance of algorithms which employ randomization in
the data layout.