next up previous
Next: Efficient Algorithms for MPEG Up: DATA COMPRESSION Previous: Lexicographic Bit Allocation for

Text Compression via Alphabet Re-representation

P. M. Long, A. I. Natsev, and J. S. Vitter, ``Text Compression via Alphabet Re-representation,'' Proc. Data Compression Conference (DCC '97), Snowbird, Utah, March 1997. Full text (gzip-compressed postscript)

Full text (Adobe pdf format)

We consider re-representing the alphabet so that a representation of a character reflects its properties as a predictor of future text. This enables us to use an estimator from a restricted class to map contexts to predictions of upcoming characters. We describe an algorithm that uses this idea in conjunction with neural networks. The performance of this implementation is compared to other compression methods, such as UNIX compress, gzip, PPMC, and an alternative neural network approach.



Jeff Vitter
2008-07-05