next up previous
Next: Efficient Update of Indexes Up: EXTERNAL MEMORY ALGORITHMS, I/O Previous: Aggregate Predicate support in

   
XPathLearner: An On-Line Self-Tuning Markov Histogram for XML Path Selectivity Estimation

L. Lim, M. Wang, S. Padmanabhan, J. S. Vitter, and R. Parr. Submitted. An earlier version appears as ``XPathLearner: An On-Line Self-Tuning Markov Histogram for XML Path Selectivity Estimation,'' Proceedings of the 28th International Conference on Very Large Databases (VLDB '02), Hong Kong, China, August 2002.

Full text (Adobe pdf format)

The extensible mark-up language (XML) is gaining widespread use as a format for data exchange and storage on the World Wide Web. Queries over XML data require accurate selectivity estimation of path expressions to optimize query execution plans. Selectivity estimation of XML path expression is usually done based on summary statistics about the structure of the underlying XML repository. All previous methods require an off-line scan of the XML repository to collect the statistics.

In this paper, we propose XPathLearner, a method for estimating selectivity of the most commonly used types of path expressions without looking at the XML data. XPathLearner gathers and refines the statistics using query feedback in an on-line manner and is especially suited to queries in Internet scale applications since the underlying XML repositories are likely to be inaccessible or too large to be scanned entirely. Besides the on-line property, our method also has two other novel features: (a) XPathLearner is workload aware in collecting the statistics and thus can be dramatically more accurate than the more costly off-line method under tight memory constraints, and (b) XPathLearner automatically adjusts the statistics using query feedback when the underlying XML data change. We show empirically the estimation accuracy of our method using several real data sets.


next up previous
Next: Efficient Update of Indexes Up: EXTERNAL MEMORY ALGORITHMS, I/O Previous: Aggregate Predicate support in
Jeff Vitter
2008-04-02