|
Abstract
Conceptually, the task of supporting distributed continuous queries can be divided into two components--event processing (computing the set of affected users for each data update) and notification dissemination (notifying the set of affected users). The first part of this dissertation focuses on event processing. Users are allowed to specify their interest in different forms of preferences, such as top-k, personalized ranking function and range selection. This dissertation presents geometric frameworks, data structures, and algorithms for answering several types of preference queries efficiently. Experimental evaluations show that our approaches outperform the previous ones by orders of magnitude.
The second part of the dissertation presents comprehensive solutions to the problem of processing and notifying a large number of continuous range top-k queries across a wide-area network. By using a geometric framework, the set of affected continuous queries can be described succinctly with messages that can be efficiently disseminated using content-driven networks. Fast algorithms are also developed to reformulate each update into a set of messages whose number is provably optimal, with or without knowing all continuous queries.
The final component of this dissertation is the design of a wide-area dissemination network for continuous range queries. In particular, this dissertation addresses the problem of assigning users to servers in a wide-area content-based publish/subscribe system. A good assignment should consider both users’ interests and locations, and balance multiple performance criteria including bandwidth, delay, and load balance. This dissertation presents a Monte Carlo approximation algorithm with good theoretical properties as well as a simple greedy algorithm. Using the Monte Carlo algorithm as a yardstick, the greedy algorithm is also concluded to work well across a wide range of workloads.