This paper investigates the problem of incremental joins of multiple ranked data sets when the join condition is a list of arbitrary user-defined predicates on the input tuples. This problem arises in many important applications dealing with ordered inputs and multiple ranked data sets, and requiring the top k solutions. We use multimedia applications as the motivating examples but the problem is equally applicable to traditional database applications involving optimal resource allocation, scheduling, decision making, ranking, etc.
We propose an algorithm
that enables querying of ordered
data sets by imposing arbitrary user-defined join predicates.
The basic version of the algorithm does not use any random access but
a
variation can exploit available indexes for efficient
random access based on the join predicates. A special case includes
the join scenario considered by Fagin for joins
based on identical keys, and in that case, our algorithms perform as
efficiently as Fagin's. Our main contribution, however, is the
generalization to join scenarios that were previously unsupported,
including cases where random access in the algorithm is not possible
due to lack of unique keys. In addition,
can support multiple join levels, or nested join hierarchies, which are the norm
for modeling multimedia data. We also give
-approximation
versions of both of the above algorithms. Finally, we give strong
optimality results for some of the proposed algorithms, and we study
their performance empirically.