# The I/O-Model

Typical disks have the seek time much larger than the time to transfer a single record; hence, to amortize this difference, data is transferred between main memory and disks in blocks. The standard two-level I/O-model with one (logical) disk captures the main properties of disks and defines the following parameters:

• N = number of cells in the grid
• M = number of cells that fit into main memory
• B = number of cells per disk block

An Input/Output operation (or simply an I/O) is the operation of transferring one block between disk and internal memory. Because disks are much slower than CPUs, the I/O-model measures performance as the number of I/Os performed.

Sorting and scanning are fundamental problems in the field of I/O algorithms. Optimal algorithms and bounds for these problems are known.

• The scanning bound, Scan(N), represents the number of I/Os needed to read Ncontiguous items from disk.
• The sorting bound, Sort(N), represents the number of I/Os required to sort N items.

Overall, we see that

Scan(N) < Sort(N) << N.

In practice the difference between an algorithm doing N I/Os and one doing Sort(N) I/Os can be very significant: Consider for example a problem involving N=1GB, and a typical system with block size B=32KB and internal memory size M=256MB. Then the speedup is more than three orders of magnitude. If the external memory algorithm takes 10 minutes to complete, the internal memory algorithm could use more than 150 hours, or about 6 days!

<laura@cs.duke.edu>