next up previous
Next: A Cache-Oblivious Index for Up: EXTERNAL MEMORY ALGORITHMS, I/O Previous: Efficient Indexing Methods for

   
Efficient Join Processing over Uncertain-Valued Attributes

R. Cheng, Y. Xia, S. Prabhakar, R. Shah, and J. S. Vitter. ``Efficient Join Processing over Uncertain-Valued Attributes,'' Proceedings of the 2006 ACM Conference on Information and Knowledge Management (CIKM '06), Arlington, VA, November 2006.

Full text (Adobe pdf format)

In an uncertain database, each data item is modeled as a range associated with a probability density function. Previous works for this kind of data have focussed on simple queries such as range and nearest-neighbor queries. Queries that join multiple relations have not been addressed in earlier work despite the significance of joins in databases. In this paper, we address probabilistic join over uncertain data, essentially a query that augments the results with probability guarantees to indicate the likelihood of each join tuple being part of the result. We extend the notion of join operators, such as equality and inequality, for uncertain data. We also study the performance of probabilistic join. We observe that a user may only need to know whether the probability of the results exceeds a given threshold, instead of the precise probability value. By incorporating this constraint, it is possible to achieve much better performance. In particular, we develop three sets of optimization techniques, namely item-level, page-level and index-level pruning, for different join operators. These techniques facilitate pruning with little space and time overhead, and are easily adapted to most join algorithms. We verify the performance of these techniques experimentally.


next up previous
Next: A Cache-Oblivious Index for Up: EXTERNAL MEMORY ALGORITHMS, I/O Previous: Efficient Indexing Methods for
Jeff Vitter
2009-05-24