Multiresolution Indexing of XML for Frequent Queries
Speaker:Hao He

(03/24/2004)

Abstract


XML and other types of semi-structured data are typically represented by a labeled directed graph. To speed up path expression queries over the graph, a variety of indexes, e.g., 1-index, A(k)-index, and D(k)-index, have been proposed. They usually work by partitioning the nodes in the data graph into equivalence classes and storing these equivalence classes as index nodes. A(k)-index introduces the concept of local bisimilarity for partitioning. It accurately supports path expressions of length up to some tunable constant k, allowing the trade-off between index size and query answering power. However, all index nodes in A(k)-index have the same local similarity value k, which cannot take advantage of the fact that a workload may contain path expressions of different lengths, or that different parts of the data graph may have different local similarity requirements.

To overcome these limitations, we propose M(k)- and M*(k)-indexes. The basic M(k)-index is workload-aware: Like the previously proposed D(k)-index, it allows different index nodes to have different local similarity requirements, providing finer partitioning for parts of the data graph targeted by longer path expressions, and coarser partitioning for those targeted by shorter path expressions. Unlike D(k)-index, however, M(k)-index is never over-refined for irrelevant index or data nodes. However, the workload-aware feature of M(k)-index incurs overrefinement due to over-qualified ancestors. To solve this problem, we further propose the M*(k)-index. An M*(k)-index consists of a collection of indexes whose nodes are organized in a partition hierarchy, allowing successively coarser partitioning information to co-exist with the finest partitioning information required. Experiments show that our indexes are superior to previously proposed indexes in terms of index size and query performance.


Return to the SPIDER schedule

Jaidev Patwardhan
Last modified: Tue Jan 27 15:26:39 EST 2004