next up previous
Next: Online Data Structures in Up: EXTERNAL MEMORY ALGORITHMS, I/O Previous: On Two-Dimensional Indexability and

   
A Simple and Efficient Parallel Disk Mergesort

R. D. Barve and J. S. Vitter. ``A Simple and Efficient Parallel Disk Mergesort,'' invited submission to a special issue of Theory of Computing Systems, 35(2), March/April 2002, 189-215. A shortened version appears in Proceedings of the 11th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA '99), St. Malo, France, June 1999.

Full text (Adobe pdf format)

Slides for talk plus extra foils on dynamic memory allocation (gzip-compressed postscript)

External sorting is a fundamental operation in many large scale data processing systems not only for producing sorted output but also as a core subroutine in many operations. Technology trends indicate that developing techniques that effectively use multiple disks in parallel in order to speed up the performance of external sorting is of prime importance. The simple randomized merging (SRM) mergesort algorithm proposed in our earlier work is the first parallel disk sorting algorithm that requires a provably optimal number of passes and that is fast in practice. Knuth (in the new edition of The Art of Computer Programming, Vol. 3: Sorting and Searching) recently identified SRM (which he calls ``randomized striping'') as the method of choice for sorting with parallel disks.

In this paper, we present an efficient implementation of SRM, based upon novel data structures. We give a new implementation for SRM's lookahead forecasting technique for parallel prefetching and its forecast and flush technique for buffer management. Our techniques amount to a significant improvement in the way SRM carries out the parallel, independent disk accesses necessary to efficiently read blocks of input runs during external merging.

We present the performance of SRM over a wide range of input sizes and compare its performance with that of disk-striped mergesort (DSM), the commonly used technique to implement external mergesort on D parallel disks. DSM consists of using a standard mergesort algorithm in conjunction with striped I/O for parallel disk access. SRM merges together significantly more runs at a time compared with DSM, and thus it requires fewer merge passes. We demonstrate in practical scenarios that even though the streaming speeds for merging with DSM are a little higher than those for SRM (since DSM merges fewer runs at a time), sorting using SRM is significantly faster than with DSM, since SRM requires fewer passes.

The techniques in this paper can be generalized to meet the load-balancing requirements of other applications using parallel disks, including distribution sort, multiway partitioning of a file into several other files. and some potential multimedia streaming applications.


next up previous
Next: Online Data Structures in Up: EXTERNAL MEMORY ALGORITHMS, I/O Previous: On Two-Dimensional Indexability and
Jeff Vitter
2009-11-09