next up previous
Next: Compressed Data Structures: Dictionaries Up: DATA COMPRESSION Previous: When Indexing Equals Compression:

   
Fast Compression with a Static Model in High-Order Entropy

L. Foschini, R. Grossi, A. Gupta, and J. S. Vitter. ``Fast Compression with a Static Model in High-Order Entropy.'' An extended abstract appears in Proceedings of the 2004 IEEE Data Compression Conference (DCC '04), Snowbird, UT, March 2004, 23-25.

Full text (Adobe pdf format)

We report on a simple encoding format called wzip for decompressing block-sorting transforms, such as the Burrows-Wheeler Transform (BWT). Our compressor uses the simple notions of gamma encoding and RLE organized with a wavelet tree to achieve a slightly better compression ration than bzip2 in less time. In fact, our compression/decompression time is dependent upon Hh, the empirical hth order entropy. Another key contribution of our compressor is its simplicity. Our compressor can also operate as a full-text index with a small amount of data, while still preserving backward compatibility with just the compressor.



Jeff Vitter
2008-04-02