next up previous
Next: Constrained Querying of Multimedia Up: EXTERNAL MEMORY ALGORITHMS, I/O Previous: Flow Computation on Massive

   
Distribution Sort with Randomized Cycling

J. S. Vitter and D. A. Hutchinson. ``Distribution Sort with Randomized Cycling,'' Journal of the ACM, to appear. An earlier version appeared in Proceedings of the 12th Annual SIAM/ACM Symposium on Discrete Algorithms (SODA '01), Washington, DC, January 2001.

Full text (Adobe pdf format)

Parallel independent disks can enhance the performance of external memory (EM) algorithms, but the programming task is often difficult. In this paper we develop randomized variants of distribution sort for use with parallel independent disks. We propose a simple variant called randomized cycling distribution sort (RCD) and prove that it has optimal expected I/O complexity. The analysis uses a novel reduction to a model with significantly fewer probabilistic interdependencies. Experimental evidence is provided to support its practicality. Other simple variants are also examined experimentally and appear to offer similar advantages to RCD. Based upon ideas in RCD we propose general techniques that transparently simulate algorithms developed for the unrealistic multihead disk model so that they can be run on the realistic parallel disk model. The simulation is optimal for two important classes of algorithms: the class of multipass algorithms, which make a complete pass through their data before accessing any element a second time, and the algorithms based upon the well-known distribution paradigm of EM computation.


next up previous
Next: Constrained Querying of Multimedia Up: EXTERNAL MEMORY ALGORITHMS, I/O Previous: Flow Computation on Massive
Jeff Vitter
2008-07-05