next up previous
Next: The SBC-tree: An Index Up: EXTERNAL MEMORY ALGORITHMS, I/O Previous: Efficient Join Processing over

   
A Cache-Oblivious Index for Approximate String Matching

W.-K. Hon, T.-W. Lam, R. Shah, S.-L. Tam, and J. S. Vitter. ``A Cache-Oblivious Index for Approximate String Matching,'' Proceedings of the 16th Annual Conference on Combinatorial Pattern Matching (CPM '07), London, Ontario, Canada, July 2007, published in Lecture Notes in Computer Science, 4580 Springer-Verlag, Berlin, Germany, 40-51.

Full text (Adobe pdf format)

This paper revisits the problem of indexing a text for approximate string matching. Specifically, given a text T of length n and a positive integer k, we want to construct an index of T such that for any input pattern P, we can find all its k-error matches in T efficiently. This problem is well-studied in the internal-memory setting. Here, we extend some of these recent results to external-memory solutions, which are also cache-oblivious. Our first index occupies $O((n\log k n)/B)$ disk pages and finds all k-error matches with I/Os, where B denotes the number of words in a disk page. To the best of our knowledge, this index is the first external-memory data structure that does not require I/Os. The second index reduces the space to O((nlogn)/B)disk pages, and the I/O complexity is $O((\vert P\vert+\mathit{occ})/B+\log^{k(k+1)} n \log\log n)$.


next up previous
Next: The SBC-tree: An Index Up: EXTERNAL MEMORY ALGORITHMS, I/O Previous: Efficient Join Processing over
Jeff Vitter
2008-04-02