External Memory Algorithms and Data Structures
The data sets involved in many modern applications are often too massive
to fit in main memory of even the most powerful computers and must therefore
reside on disk. Thus communication between internal and external memory,
and not actual computation time, becomes the bottleneck in the computation.
This is due to the huge difference in access time of fast internal memory
and slower external memory such as disks.
The goal of theoretical work in the area of external memory algorithms
(also called I/O algorithms or out-of-core algorithms) has
been to develop algorithms that minimize the Input/Output communication
(or just I/O) performed when solving a given problem. The area was
effectively started in the late eighties by Aggarwal and Vitter and subsequently
I/O algorithms have been developed for several problem domain. Also I/O
performance can often be improved if many disks can efficiently be used
in parallel and the use of parallel disks has received a lot of theoretical
attention. See below for recent surveys of theoretical results in the area
of I/O-efficient algorithms.
TPIE is designed to bridge the gap between the theory and practice of
parallel I/O systems. It is intended to demonstrate all of the following
simultaneously:
-
Abstract away the details of how I/O is performed so that programmers need
only deal with a simple high level interface.
-
Implement I/O-optimal paradigms for large scale computation that are efficient
not only in theory, but also in practice.
-
Remain flexible, allowing programmers to specify the functional details
of computation taking place within the supported paradigms. This will allow
a wide variety of algorithms to be implemented within the system.
-
Be portable across a variety hardware platforms.
-
Be extensible, so that new features can be easily added later.
TPIE is implemented as a set of templated classes and functions in C++.
It also includes a small library and a set of test and sample
applications.
Selected bibliography
- P. K. Agarwal,
L. Arge, J. Erickson, P. G. Franciosa, and J. S. Vitter.
Efficient searching with linear constraints.
In Proc. ACM Symp. Principles of Database Systems, 1998.
(PostScript)
- P. K. Agarwal,
L. Arge, T.M. Murali, K. Varadarajan, and J.S. Vitter.
I/O-efficient algorithms for contour line extraction and planar graph
blocking.
In Proc. ACM-SIAM Symp. on Discrete Algorithms, pages 117-126,
1998.
(PostScript)
- P. K. Agarwal,
L. Arge, G. Brodal, and J. S. Vitter.
I/O-efficient dynamic point location in monotone planar subdivisions.
In Proc. ACM-SIAM Symp. on Discrete Algorithms, 1999.
(PostScript)
- P. K. Agarwal,
L. Arge, O. Procopiuc, and J. S. Vitter.
A framework for index dynamization.
In Proceedings of the International Colloquium on Automata, Language, and
Programming, Crete, Greece, 2001.
- L. Arge and P. B.
Miltersen.
On showing lower bounds for external-memory computational geometry.
In J. Abello and J. S. Vitter, editors, External Memory Algorithms and
Visualization, DIMACS series. American Mathematical Society, 1998.
(To appear).
(PostScript)
- L. Arge and J. S.
Vitter.
Optimal dynamic interval management in external memory.
In Proc. IEEE Symp. on Foundations of Comp. Sci., pages 560-569,
1996.
(PostScript)
- L. Arge, D. E.
Vengroff, and J. S. Vitter.
External-memory algorithms for processing line segments in geographic
information systems.
In Proc. Annual European Symposium on Algorithms, LNCS 979, pages
295-310, 1995.
To appear in special issue of Algorithmica.
(PostScript)
- L. Arge, P. Ferragina,
R. Grossi, and J. Vitter.
On sorting strings in external memory.
In Proc. ACM Symp. on Theory of Computation, 1997.
(PostScript)
- L. Arge,
O. Procopiuc, S. Ramaswamy, T. Suel, and J. S. Vitter.
Theory and practice of I/O-efficient algorithms for multidimensional batched
searching problems.
In Proc. ACM-SIAM Symp. on Discrete Algorithms, pages 685-694,
1998.
(PostScript)
- Lars Arge, Jeff Chase,
Jeffrey S. Vitter, and Rajiv Wickremesinghe.
Efficient sorting using registers and caches.
In Proceedings of the Workshop on Algorithm Engineering, Lecture
Notes in Computer Science. Springer-Verlag, September 2000.
- L. Arge.
External-memory algorithms with applications in geographic information systems.
In Algorithmic Foundations of GIS, volume 1340. Springer Verlag,
1996.
(PostScript)
- R. D. Barve, E. F.
Grove, and J. S. Vitter.
Simple randomized mergesort on parallel disks.
In Proc. ACM Symp. on Parallel Algorithms and Architectures, pages
109-118, 1996.
To appear in special issue on parallel I/O in ``Parallel Computing''.
(PostScript)
- R. Barve,
M. Kallahalla, P. Varman, and J. S. Vitter.
Competitive analysis of buffer management algorithms for parallel I/O
systems.
In Proc. ACM-IEEE Workshop on I/O in Parallel And Distributed
Systems, 1997.
- R. D. Barve, E. F.
Grove, and J. S. Vitter.
Simple randomized mergesort on parallel disks.
Parallel Computing, 23(4):601-631, 1997.
- R. Barve, E. A. M.
Shriver, P. B. Gibbons, B. K. Hillyer, and Y. Matias.
Modeling and imizing I/O throughput of multiple disks on a bus.
In Proc. Joint International Conference on Measurement and Modeling of
Computer Systems, 1998.
- R. Barve, E. F.
Grove, and J. S. Vitter.
A competitive application-controlled paging algorithm for a shared cache.
SIAM Journal of Computing, 2000.
- G. Gibson, J. S.
Vitter, and J. Wilkes (Editors).
Strategic directions in storage I/O for large-scale computing.
ACM Computing Surveys, 28(4):779-793, 1996.
(PostScript)
- D. E. Vengroff and
J. S. Vitter.
I/O-efficient scientific computation using tpie.
In Proceedings of the Goddard Conference on Mass Storage Systems and
Technologies, published in NASA Conference Publication 3340, Volume
II, pages 553-570, 1996.
(PostScript)
- D. E. Vengroff
and J. S. Vitter.
Three-dimensional range searching in external memory.
In Proc. ACM Symp. on Theory of Computation, 1996.
(PostScript)
- J. S. Vitter.
External memory algorithms and data structures: Dealing with MASSIVE data.
ACM Computing Surveys, in press.
Available via the author's web page
texttt http://www.cs.duke.edu/~jsv/.
- M. Wang, B. Iyer, and
J. S. Vitter.
Scalable mining for classification rules in relational databases.
In Proc. International Database Engineering & Application
Symposium, pages 58-67, 1998.