External Memory Algorithms and Data Structures

The data sets involved in many modern applications are often too massive to fit in main memory of even the most powerful computers and must therefore reside on disk. Thus communication between internal and external memory, and not actual computation time, becomes the bottleneck in the computation. This is due to the huge difference in access time of fast internal memory and slower external memory such as disks.

The goal of theoretical work in the area of external memory algorithms (also called I/O algorithms or out-of-core algorithms) has been to develop algorithms that minimize the Input/Output communication (or just I/O) performed when solving a given problem. The area was effectively started in the late eighties by Aggarwal and Vitter and subsequently I/O algorithms have been developed for several problem domain. Also I/O performance can often be improved if many disks can efficiently be used in parallel and the use of parallel disks has received a lot of theoretical attention. See below for recent surveys of theoretical results in the area of I/O-efficient algorithms.

TPIE is designed to bridge the gap between the theory and practice of parallel I/O systems. It is intended to demonstrate all of the following simultaneously:

TPIE is implemented as a set of templated classes and functions in C++. It also includes a small library and a set of test and sample applications.

Selected bibliography