A second key area of massive data where Dr. Vitter plays a leadership role is compressed data structures, where the goal is to operate directly upon compressed representations of data, yet still achieve fast search. The wavelet tree data structure he co-developed (not to confuse with wavelets discussed two paragraphs below) is an elegant structure for coding sequences of characters from a multicharacter alphabet and is a key component in modern indexing and compression. Until this century, fast data structures for text indexing (such as suffix trees and suffix arrays) required much more space than the data being indexed! Based upon a recursive decomposition of the suffix array, Dr. Vitter and colleagues invented the compressed suffix array, which is substantially smaller -- the first index ever provably shown to use only linear space, and then later the first ever whose size per character was proven to be asymptotic (i.e., with constant of proportionality 1) to the higher-order entropy of the text. The index can even reconstruct the original text in a random access manner, and thus the original text can be discarded. The net effect is that the text can be completely replaced by an index structure that has the size of compressed text but can be queried quickly.
In a third aspect of massive data, Dr. Vitter is a leading figure in the data compression community and is especially noted for his analytical bent and influence. He has done fundamental work on data compression for text, images, and video. A provably efficient algorithm for adaptive Huffman coding bears his name. With a former student, Dr. Vitter developed and analyzed fast and practical methods for arithmetic coding. They invented the FELICS algorithm for lossless image compression; it was subsequently implemented in hardware as part of NASA's Mars Reconnaissance Orbiter. It introduced a low-cost prediction framework that led to algorithms ultimately adopted into the Lossless JPEG standard. In video compression, Dr. Vitter and group proposed the paradigm of minimizing the combined measure of rate plus distortion to significantly improve motion estimation coding; this rate-distortion optimization has been incorporated into the H.264/MPEG-4 AVC standard's reference encoder, used widely in the computing and communications industry.
Fourth, Dr. Vitter and collaborators were the first in the database and systems communities to apply wavelets and compression techniques as key tools for summarizing, approximating, and predicting data. Wavelets have since become heavily used in database optimization, data warehousing, data streams, image processing, and data mining. For his work on wavelets for approximating high-dimensional aggregates, he and his coauthor were the recipients of the 2009 ACM SIGMOD Test of Time award, which recognizes its paper from 10 years earlier that has had the most impact in the following decade in terms of research, products, and methodology. Dr. Vitter has co-developed novel machine learning and prediction mechanisms based upon data compression, using the principle that the more compressible a sequence is, the more predictable it is. His universal prediction algorithms for online prefetching are provably asymptotically optimal (i.e., with constant of proportionality 1). The methods predict as well as special-purpose methods tuned to the characteristics of the sequence of page requests. His learning work includes algorithms for prefetching, caching, data streams, database query optimization, data mining, and power management in mobile computers.
Beginning with his thesis on coalesced hashing, a search method used widely in practice, Dr. Vitter has made many contributions to the analysis of algorithms, using mathematical analysis and asymptotics to derive precise estimates for resource requirements. He has also done much work involving randomized, parallel, and incremental algorithms for a variety of problems in computational geometry, combinatorial optimization, graphics, random sampling, and random variate generation.