Full text with FFT figure missing (gzip-compressed postscript)
Full text with FFT figure missing (Adobe pdf format)
We provide the first optimal algorithms in terms of the number of
input/outputs (I/Os) required between internal memory and multiple
secondary storage devices for the problems of sorting, FFT, matrix
transposition, standard matrix multiplication, and related problems.
Our two-level memory model is new and gives a realistic treatment of
parallel block transfer, in which during a single I/O each of the
P secondary storage devices can simultaneously transfer a contiguous
block of B records. The model pertains to a large-scale uniprocessor
system or parallel multiprocessor system with P disks. In addition,
the sorting, FFT, permutation network, and standard matrix
multiplication algorithms are typically optimal in terms of the amount
of internal processing time. The difficulty in developing optimal
algorithms is to cope with the partitioning of memory into P separate
physical devices. Our algorithms' performance can be significantly
better than those obtained by the well-known but nonoptimal technique of
disk striping. Our optimal sorting algorithm is randomized, but
practical; the probability of using more than
times the optimal
number of I/Os is exponentially small in
,
where M is the internal memory size.