Multiresolution Indexing of XML for Frequent Queries
Speaker:Hao He
(03/24/2004)
Abstract
XML and other types of semi-structured data are typically
represented by a labeled directed graph. To speed up path
expression queries over the graph, a variety of indexes, e.g.,
1-index, A(k)-index, and D(k)-index, have been proposed. They
usually work by partitioning the nodes in the data graph into
equivalence classes and storing these equivalence classes as index
nodes. A(k)-index introduces the concept of local bisimilarity
for partitioning. It accurately supports path expressions of
length up to some tunable constant k, allowing the trade-off
between index size and query answering power. However, all index
nodes in A(k)-index have the same local similarity value k,
which cannot take advantage of the fact that a workload may
contain path expressions of different lengths, or that different
parts of the data graph may have different local similarity
requirements.
To overcome these limitations, we propose M(k)- and
M*(k)-indexes. The basic M(k)-index is workload-aware: Like
the previously proposed D(k)-index, it allows different index
nodes to have different local similarity requirements, providing
finer partitioning for parts of the data graph targeted by longer
path expressions, and coarser partitioning for those targeted by
shorter path expressions. Unlike D(k)-index, however,
M(k)-index is never over-refined for irrelevant index or data
nodes. However, the workload-aware feature of M(k)-index incurs
overrefinement due to over-qualified ancestors. To solve this
problem, we further propose the M*(k)-index. An M*(k)-index
consists of a collection of indexes whose nodes are organized in a
partition hierarchy, allowing successively coarser partitioning
information to co-exist with the finest partitioning information
required. Experiments show that our indexes are superior to
previously proposed indexes in terms of index size and query
performance.
Return to the SPIDER schedule
Jaidev Patwardhan
Last modified: Tue Jan 27 15:26:39 EST 2004