Typical disks have the seek time much larger than the time to
transfer a single record; hence, to amortize this difference, data is
transferred between main memory and disks in *blocks*. The
standard two-level *I/O-model* with one (logical) disk captures
the main properties of disks and defines the following parameters:

- N = number of cells in the grid
- M = number of cells that fit into main memory
- B = number of cells per disk block

An *Input/Output operation* (or simply an *I/O*) is
the operation of transferring one block between disk and internal
memory. Because disks are much slower than CPUs, the I/O-model
measures performance as the number of I/Os performed.

Sorting and scanning are fundamental problems in the field of I/O algorithms. Optimal algorithms and bounds for these problems are known.

- The
*scanning bound*,*Scan(N)*, represents the number of I/Os needed to read*N*contiguous items from disk. - The
*sorting bound*,*Sort(N)*, represents the number of I/Os required to sort*N*items.

Overall, we see that

*Scan(N) < Sort(N) << N*.

In practice the difference between an algorithm doing *N* I/Os
and one doing *Sort(N)* I/Os can be very significant: Consider
for example a problem involving *N=1GB*, and a typical system
with block size *B=32KB* and internal memory size
*M=256MB*. Then the speedup is more than three orders of
magnitude. If the external memory algorithm takes 10 minutes to
complete, the internal memory algorithm could use more than 150 hours,
or about 6 days!

<laura@cs.duke.edu> Last modified: Tue Apr 17 11:25:59 2001 by laura.