next up previous
Next: Fast Compression with a Up: DATA COMPRESSION Previous: High-Order Entropy-Compressed Text Indexes

   
When Indexing Equals Compression: Experiments with Compressing Suffix Arrays and Applications

R. Grossi, A. Gupta, and J. S. Vitter. ``When Indexing Equals Compression: Experiments with Compressing Suffix Arrays and Applications,'' ACM Transactions on Algorithms, to appear. An extended abstract appears in Proceedings of the 15th Annual SIAM/ACM Symposium on Discrete Algorithms (SODA '04), New Orleans, LA, January 2004.

Full text (Adobe pdf format)

We report on a new and improved version of high-order entropy-compressed suffix arrays, which has theoretical performance guarantees comparable to previous work, yet represents an improvement in practice. Our experiments indicate that the resulting text index offers state-of-the-art compression. In particular, we require roughly 20% of the original text size--without requiring a separate instance of the text--and support fast and powerful searches. To our knowledge, this is the best known method in terms of space for fast searching. We can additionally use a simple notion to encode and decode block-sorting transforms (such as the Burrows-Wheeler transform), achieving a slightly better compression ratio than bzip2. We also provide a compressed representation of suffix trees (and their associated text) in a total space that is comparable to that of the text alone compressed with gzip.


next up previous
Next: Fast Compression with a Up: DATA COMPRESSION Previous: High-Order Entropy-Compressed Text Indexes
Jeff Vitter
2008-04-02