Ph. D. Defense

Algorithms for Continuous Queries: A Geometric Approach

Speaker:Albert Yu
syu at cs.duke.edu
Date: Wednesday, May 22, 2013
Time: 1:00pm - 3:00pm
Location: D344 LSRC, Duke

Abstract

A continuous query, once issued by a user, maintains the matching results for the user as new data (as well as updates to the existing data) continue to arrive in a stream. However, supporting potentially millions of continuous queries is a huge challenge. This dissertation addresses the problem of scalably processing a large number of continuous queries over a wide-area network.

Conceptually, the task of supporting distributed continuous queries can be divided into two components--event processing (computing the set of affected users for each data update) and notification dissemination (notifying the set of affected users). The first part of this dissertation focuses on event processing. Users are allowed to specify their interest in different forms of preferences, such as top-k, personalized ranking function and range selection. This dissertation presents geometric frameworks, data structures, and algorithms for answering several types of preference queries efficiently. Experimental evaluations show that our approaches outperform the previous ones by orders of magnitude.

The second part of the dissertation presents comprehensive solutions to the problem of processing and notifying a large number of continuous range top-k queries across a wide-area network. By using a geometric framework, the set of affected continuous queries can be described succinctly with messages that can be efficiently disseminated using content-driven networks. Fast algorithms are also developed to reformulate each update into a set of messages whose number is provably optimal, with or without knowing all continuous queries.

The final component of this dissertation is the design of a wide-area dissemination network for continuous range queries. In particular, this dissertation addresses the problem of assigning users to servers in a wide-area content-based publish/subscribe system. A good assignment should consider both users’ interests and locations, and balance multiple performance criteria including bandwidth, delay, and load balance. This dissertation presents a Monte Carlo approximation algorithm with good theoretical properties as well as a simple greedy algorithm. Using the Monte Carlo algorithm as a yardstick, the greedy algorithm is also concluded to work well across a wide range of workloads.

Advisor(s): Pankaj Agarwal & Jun Yang
Romit Roy Choudhury, Ashwin Machanavajjhala