A new trend in the field of pattern matching is to design
indexing data structures
which take the space very close to that required by the
indexed text (in
entropy-compressed form) and also simultaneously achieve
good query performance. Two
popular indexes, namely the FM-index of Ferragina and Manzini and the
CSA of Grossi and Vitter,
achieve this goal by exploiting the Burrows-Wheeler
transform (BWT).
However, due to the intricate permutation structure of BWT,
no locality
of reference can be guaranteed when we perform pattern
matching with these indexes.
Chien et al. gave an alternative text
index which is based
on sparsifying the traditional suffix tree and maintaining
an auxiliary 2-D range query structure.
Given a text T of length n drawn from a
-sized
alphabet set,
they achieved an
-bit index for T and
showed that this index can preserve locality in pattern
matching and hence
is amenable to be used in external-memory settings.
We improve upon this index and show how to apply entropy
compression to reduce index space.
Our index takes
bits of
space where Hk is the kth-order
empirical entropy of the text.
This is achieved by creating variable length blocks of text
using arithmetic coding.