Welcome to the homepage for the TPIE project at Duke University.
TPIE is a software environment (written in C++) that facilitates the implementation of
external memory algorithms.
The goal of theoretical work in the area of external memory algorithms
(also called I/O algorithms or out-of-core algorithms) has
been to develop algorithms that minimize the Input/Output communication
(or just I/O) performed when solving problems on very large data
sets. The area was
effectively started in the late eighties by Aggarwal and Vitter and subsequently
I/O algorithms have been developed for several problem domains.
See below for recent surveys of theoretical results in the area
of I/O-efficient algorithms.
TPIE is designed to bridge the gap between the theory and practice of
parallel I/O systems. It is intended to demonstrate all of the following:
Abstract away the I/O details through a simple high-level interface.
Implement I/O-efficient paradigms to show their practical viability.
Be flexible, allowing
a wide variety of algorithms to be implemented within the system.
Be portable across a variety of hardware platforms.
Be extensible, so that new features can be easily added later.
The TPIE library consists of a kernel and a set of I/O-efficient algorithms and data
structures implemented on top of the kernel. Most of the functionality
is provided as templated classes and functions in C++. In addition,
small programs are provided for testing and illustrating the usage of
the application interface.
Note: The CVS snapshot is the preferred version to download and
fixes many bugs users have experienced compiling with gcc 3.4 and 4.0.
Download latest TPIE CVS snapshot (September 19, 2005):tpie_latest.tgz Download latest TPIE release (August 29, 2002):tpie_082902.tgz Download latest TPIE
manual (August 29, 2002):tpie.ps, tpie.pdf; or with two pages per sheet, to save
paper: tpie2up.ps, tpie2up.pdf
The latest release includes the routines for solving fundamental batched
problems such as sorting, merging, and permuting. These routines enable the programmer to write
efficient and portable implementations of algorithms that make use of
the fundamental streaming primitives which have been identified
in theoretical work. It also includes support for random access
to disk blocks, which is the foundation for implementing external
memory data structures. The classical external memory structure,
the B-tree, is included in the distribution.
A short summary of changes relative to previous versions can
be found in the CHANGES file. Installation instructions for the current release are provided
in the TPIE README file and in
the TPIE manual. The TPIE project is work in progress and users
of TPIE are encouraged to send bug reports, comments, and suggestions
to tpie@cs.duke.edu.
Work on extending and improving TPIE is in
progress. Projects include further performance improvements, support
for parallel disks, addition of the distribution sweeping primitive,
addition of various data structure implementations, and addition of
several application examples.
Users interested in obtaining/testing preliminary versions of the
above extensions are encouraged to send a request to tpie@cs.duke.edu.
A mailing list
for TPIE users interested in information about new releases and bug
reports exists. Requests to be added to the list should be sent to tpie@cs.duke.edu.
A number of research papers have been written either about TPIE or using
TPIE. The slides from an ESA 2002 talk about TPIE can be
downloaded from here.
O. Procopiuc, P. K. Agarwal, L. Arge, J. S. Vitter. Bkd-tree: A Dynamic Scalable
kd-Tree. To appear in Proc. 8th Int'l
Symposium on Spatial and Temporal Databases (SSTD '03).
L. Arge, J. Chase, L. Toma, J. S. Vitter,
R. Wickremesinghe, P. Halpin, and D. Urban,
Flow Computation on Massive Grids.
Proceedings of the 9th ACM International Symposium
on Advances in Geographic Information Systems (ACM-GIS '01),
Atlanta, GA, November 2001.
L. Arge, K. H. Hinrichs, J. Vahrenhold, and J. S. Vitter.
Efficient
Bulk Operations on Dynamic R-trees. Proc. 1st Workshop on Algorithm
Engineering and Experimentation (ALENEX '99), Lecture Notes in Computer
Science, Springer-Verlag, 1999.
L. Arge, O. Procopiuc, S. Ramaswamy, T. Suel, and J. S. Vitter.
Scalable
Sweep-Based Spatial Join. Proc. International Conference on
Very Large Databases (VLDB '98), 1998, pp. 570-581.
D. E. Vengroff and J. S. Vitter.
I/O-Efficient
Scientific Computation using TPIE. Proc. Goddard Conference
on Mass Storage Systems and Technologies, 1996, published in NASA Conference
Publication 3340, Volume II, 553-570.
L. Arge. External Memory Data Structures. In Handbook of Massive Data Sets, J. Abello, P. M. Pardalos, and M. G. C. Resende (Eds.), Kluwer Academic Publishers, 2001.
J. S. Vitter. Online Data
Structures in External MemoryProc. 26th Annual International
Colloquium on Automata, Languages, and Programming (ICALP '99), Lecture
Notes in Computer Science, Springer-Verlag, 1999, 119--133.