next up previous
Next: Education Up: Jeff Vitter's Curriculum Vitæ Previous: Biography

Research Interests

In his research, Jeff Vitter seeks to exploit the rich interdependence between mathematical computing theory and practice. Dr. Vitter has pioneered the development of several important subfields dealing with massive data. He is perhaps best known as a founder of the field of external memory algorithms, which focuses on alleviating the I/O bottleneck between fast internal memory and slow external storage (such as disk). The goal is to design algorithms that exploit locality and parallelism in order to reduce I/O costs, which is important in a variety of data-intensive applications. His recent book, Algorithms and Data Structures for External Memory, serves as a reference for the field. He has developed paradigms and efficient I/O algorithms for external memory and hierarchical memory in several domains, including geographic information systems (GIS), databases, computational science and engineering, sorting, text and string indexing, matrix computations, graph traversal, range search, data mining, and a variety of computational geometry and combinatorial problems. His approach based upon duality for utilizing parallel disks, in which communication with each disk can occur simultaneously, has led to state-of-the-art methods for sorting. He has contributed to algorithm engineering via the TPIE system (Transparent Parallel I/O programming Environment).

A second key area of massive data where Dr. Vitter plays a leadership role is compressed data structures, where the goal is to operate directly upon compressed representations of data, yet still achieve fast search. The wavelet tree data structure he co-developed (not to confuse with wavelets discussed two paragraphs below) is an elegant structure for coding sequences of characters from a multicharacter alphabet and is a key component in modern indexing and compression. Until this century, fast data structures for text indexing (such as suffix trees and suffix arrays) required much more space than the data being indexed! Based upon a recursive decomposition of the suffix array, Dr. Vitter and colleagues invented the compressed suffix array, which is substantially smaller -- the first index ever provably shown to use only linear space, and then later the first ever whose size per character was proven to be asymptotic (i.e., with constant of proportionality 1) to the higher-order entropy of the text. The index can even reconstruct the original text in a random access manner, and thus the original text can be discarded. The net effect is that the text can be completely replaced by an index structure that has the size of compressed text but can be queried quickly.

In a third aspect of massive data, Dr. Vitter is a leading figure in the data compression community and is especially noted for his analytical bent and influence. He has done fundamental work on data compression for text, images, and video. A provably efficient algorithm for adaptive Huffman coding bears his name. With a former student, Dr. Vitter developed and analyzed fast and practical methods for arithmetic coding. They invented the FELICS algorithm for lossless image compression; it was subsequently implemented in hardware as part of NASA's Mars Reconnaissance Orbiter. It introduced a low-cost prediction framework that led to algorithms ultimately adopted into the Lossless JPEG standard. In video compression, Dr. Vitter and group proposed the paradigm of minimizing the combined measure of rate plus distortion to significantly improve motion estimation coding; this rate-distortion optimization has been incorporated into the H.264/MPEG-4 AVC standard's reference encoder, used widely in the computing and communications industry.

Fourth, Dr. Vitter and collaborators were the first in the database and systems communities to apply wavelets and compression techniques as key tools for summarizing, approximating, and predicting data. Wavelets have since become heavily used in database optimization, data warehousing, data streams, image processing, and data mining. For his work on wavelets for approximating high-dimensional aggregates, he and his coauthor were the recipients of the 2009 ACM SIGMOD Test of Time award, which recognizes its paper from 10 years earlier that has had the most impact in the following decade in terms of research, products, and methodology. Dr. Vitter has co-developed novel machine learning and prediction mechanisms based upon data compression, using the principle that the more compressible a sequence is, the more predictable it is. His universal prediction algorithms for online prefetching are provably asymptotically optimal (i.e., with constant of proportionality 1). The methods predict as well as special-purpose methods tuned to the characteristics of the sequence of page requests. His learning work includes algorithms for prefetching, caching, data streams, database query optimization, data mining, and power management in mobile computers.

Beginning with his thesis on coalesced hashing, a search method used widely in practice, Dr. Vitter has made many contributions to the analysis of algorithms, using mathematical analysis and asymptotics to derive precise estimates for resource requirements. He has also done much work involving randomized, parallel, and incremental algorithms for a variety of problems in computational geometry, combinatorial optimization, graphics, random sampling, and random variate generation.


next up previous
Next: Education Up: Jeff Vitter's Curriculum Vitæ Previous: Biography
Jeff Vitter
2009-11-16