next up previous
Next: On Searching Compressed String Up: DATA COMPRESSION Previous: Geometric Burrows-Wheeler Transform: Linking

   
Compressed Index for Dictionary Matching

W.-K. Hon, T.-W. Lam, R. Shah, S.-L. Tam, and J. S. Vitter. ``Compressed Index for Dictionary Matching,'' Proceedings of the 2008 IEEE Data Compression Conference (DCC '08), Snowbird, UT, March 2008.

Full text (Adobe pdf format)

The past few years have witnessed several exciting results on compressed representation of a string T that supports efficient pattern matching, and the space complexity has been reduced to $\vert T\vert H_k(T)+o(\vert T\vert \log \sigma)$ bits, where Hk(T)denotes the kth-order empirical entropy of T, and $\sigma$ is the size of the alphabet. In this paper we study compressed representation for another classical problem of string indexing, which is called dictionary matching in the literature. Precisely, a collection D of strings (called patterns) of total length n is to be indexed so that given a text T, the occurrences of the patterns in T can be found efficiently. In this paper we show how to exploit a sampling technique to compress the existing O(n)-word index to an $(nH_k(D) + o(n log \sigma))$-bit index with only a small sacrifice in search time.



Jeff Vitter
2009-11-09