TPIE:
a Transparent Parallel I/O Environment

TPIE has moved


The TPIE project has moved to http://www.madalgo.au.dk/tpie.



Welcome to the homepage for the TPIE project at Duke University.

TPIE is a software environment (written in C++) that facilitates the implementation of external memory algorithms. The goal of theoretical work in the area of external memory algorithms (also called I/O algorithms or out-of-core algorithms) has been to develop algorithms that minimize the Input/Output communication (or just I/O) performed when solving problems on very large data sets. The area was effectively started in the late eighties by Aggarwal and Vitter and subsequently I/O algorithms have been developed for several problem domains. See below for recent surveys of theoretical results in the area of I/O-efficient algorithms.

TPIE is designed to bridge the gap between the theory and practice of parallel I/O systems. It is intended to demonstrate all of the following:

The TPIE library consists of a kernel and a set of I/O-efficient algorithms and data structures implemented on top of the kernel. Most of the functionality is provided as templated classes and functions in C++. In addition, small programs are provided for testing and illustrating the usage of the application interface.

TPIE is pronounced 'tE-'pI (like tea-pie).

Software Distribution

TPIE is currently available to testers under the terms of the GNU General Public License, Version 2.

Latest Version

Note: The CVS snapshot is the preferred version to download and fixes many bugs users have experienced compiling with gcc 3.4 and 4.0.

Download latest TPIE CVS snapshot (September 19, 2005): tpie_latest.tgz
Download latest TPIE release (August 29, 2002): tpie_082902.tgz
Download latest TPIE manual (August 29, 2002): tpie.ps, tpie.pdf; or with two pages per sheet, to save paper: tpie2up.ps, tpie2up.pdf

The latest release includes the routines for solving fundamental batched problems such as sorting, merging, and permuting. These routines enable the programmer to write efficient and portable implementations of algorithms that make use of the fundamental streaming primitives which have been identified in theoretical work. It also includes support for random access to disk blocks, which is the foundation for implementing external memory data structures. The classical external memory structure, the B-tree, is included in the distribution.

A short summary of changes relative to previous versions can be found in the  CHANGES  file.
Installation instructions for the current release are provided in the TPIE  README  file and in the TPIE manual.
The TPIE project is work in progress and users of TPIE are encouraged to send bug reports, comments, and suggestions to tpie@cs.duke.edu.

Previous Versions

Future Releases

Work on extending and improving TPIE is in progress. Projects include further performance improvements, support for parallel disks, addition of the distribution sweeping primitive, addition of various data structure implementations, and addition of several application examples.

Users interested in obtaining/testing preliminary versions of the above extensions are encouraged to send a request to tpie@cs.duke.edu.

A mailing list for TPIE users interested in information about new releases and bug reports exists. Requests to be added to the list should be sent to tpie@cs.duke.edu.

Contributors

The TPIE project involves the efforts of several people in the Department of Computer Science at Duke University and elsewhere. It is also coordinated with the Center for Geometric and Biological Computing at Duke University. TPIE project contributors include

Current Contributors

Past Contributors

  • Rakesh Barve
  • David Hutchinson
  • Lipyeow Lim
  • Paul Natsev
  • Octavian Procopiuc
  • Vasilis Samoladas
  • Laura Toma
  • Darren Vengroff
  • Min Wang
  • Rajiv Wickremesinghe

Research Papers on TPIE

A number of research papers have been written either about TPIE or using TPIE. The slides from an ESA 2002 talk about TPIE can be downloaded from here.
  1. O. Procopiuc, P. K. Agarwal, L. Arge, J. S. Vitter. Bkd-tree: A Dynamic Scalable kd-Tree. To appear in Proc. 8th Int'l Symposium on Spatial and Temporal Databases (SSTD '03).
  2. P. K. Agarwal, L. Arge, S. Govindarajan. CRB-Tree: An Efficient Indexing Scheme for Range Aggregate Queries. To apear in Proc. 9th Int'l Conference on Database Theory (ICDT '03).
  3. L. Arge, A. Danner, Sha-Mayn Teh. I/O-Efficient Point Location using Persistent B-trees. Proc. 5th Workshop on Algorithm Engineering and Experimentation (ALENEX '03).
  4. L. Arge, O. Procopiuc, J. S. Vitter. Implementing I/O-Efficient Data Structures Using TPIE. Proc. 10th European Symposium on Algorithms (ESA '02), Rome, Italy, September 2002.
  5. L. Arge, J. Chase, L. Toma, J. S. Vitter, R. Wickremesinghe, P. Halpin, and D. Urban, Flow Computation on Massive Grids. Proceedings of the 9th ACM International Symposium on Advances in Geographic Information Systems (ACM-GIS '01), Atlanta, GA, November 2001.
  6. L. Arge, O. Procopiuc, S. Ramaswamy, T. Suel, Jan Vahrenhold, and J. S. Vitter. A Unified Approach for Indexed and Non-Indexed Spatial Joins. Proc. 7th Conference on Extending Database Technology (EDBT 2000), 2000.
  7. L Arge, L. Toma, and J. S. Vitter. I/O-Efficient Algorithms for Problems on Grid-based Terrains. Proc. 2nd Workshop on Algorithm Engineering and Experimentation (ALENEX '00).
  8. L. Arge, K. H. Hinrichs, J. Vahrenhold, and J. S. Vitter. Efficient Bulk Operations on Dynamic R-trees. Proc. 1st Workshop on Algorithm Engineering and Experimentation (ALENEX '99), Lecture Notes in Computer Science, Springer-Verlag, 1999.
  9. L. Arge, O. Procopiuc, S. Ramaswamy, T. Suel, and J. S. Vitter. Scalable Sweep-Based Spatial Join. Proc. International Conference on Very Large Databases (VLDB '98), 1998, pp. 570-581.
  10. L.Arge, O. Procopiuc, S. Ramaswamy, T. Suel, and J. S. Vitter. Theory and Practice of I/O-Efficient Algorithms for Multidimensional Batched Searching Problems. Proc. 9th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA '98), 1998.
  11. D. E. Vengroff A Transparent Parallel I/O Environment, Proceedings of the Third DAGS Symposium on Parallel Computation, Hanover, NH, July 1994, 117-134.
  12. D. E. Vengroff and J. S. Vitter. I/O-Efficient Scientific Computation using TPIE. Proc. Goddard Conference on Mass Storage Systems and Technologies, 1996, published in NASA Conference Publication 3340, Volume II, 553-570.
  13. A. Danner, T. Mølhave, K. Yi, P. K. Agarwal, L. Arge, and H. Mitasova. TerraStream: From Elevation Data to Watershed Hierarchies..In Proc. ACM Symposium on Advances in Geographic Information Systems, pages 212-219, 2007.

Survey Papers on I/O-Efficient Algorithms

  1. L. Arge. External Memory Data Structures. In Handbook of Massive Data Sets, J. Abello, P. M. Pardalos, and M. G. C. Resende (Eds.), Kluwer Academic Publishers, 2001.
  2. J. S. Vitter External Memory Algorithms and Data Structures: Dealing with MASSIVE Data. ACM Computing Surveys, 33(2), June 2001, 209-271.
  3. Lars Arge. External-Memory Algorithms with Applications in Geographic Information Systems. In Algorithmic Foundations of Geographic Information Systems, M. van Kreveld, J. Nievergelt, T. Roos and P. Widmayer (Eds.), LNCS Tutorial, Lecture Notes in Computer Science Vol. 1340, Springer-Verlag, 1997.
  4. J. S. Vitter. Online Data Structures in External Memory Proc. 26th Annual International Colloquium on Automata, Languages, and Programming (ICALP '99), Lecture Notes in Computer Science, Springer-Verlag, 1999, 119--133.

Last modified: Thursday, 02-Aug-2012 12:00:20 EDT