Here are some details about a selected subset of my work.

Continuous Queries for Internet-Scale Distributed Applications

This project focuses on designing scalable support for richer subscription models in the form of expressive database-style queries, in large-scale publish/subscribe systems. By performing smart data management with a revolutionary holistic approach incorporating both database processing and network dissemination, we can reduce costs and create powerful systems to efficiently support complex and expressive subscriptions. Previous research in continuous queries and publish/subscribe systems has largely been compartmentalized; database-centric and network-centric approaches each have their own limitations, and simply putting them together does not lead to an efficient solution. A closer examination of database/network interfaces yields a spectrum of new and interesting possibilities. In particular, we propose reformulation as a general technique to support stateful subscriptions over standard stateless off-the-shelf content-driven networks, by converting them into equivalent but stateless forms. We have shown how reformulation can successfully be applied to various stateful subscriptions including multi-way select-joins, subscriptions with value-based notification conditions, and range-aggregate subscriptions. These techniques were shown to provide orders-of-magnitude improvement over simpler techniques adopted by state-of-the-art systems, and were shown to scale to millions of subscriptions. These ideas can also improve the performance of traditional continuous queries, by helping enumerate the affected queries very efficiently. We have put these techniques together to build a high-performance wide-area publish/subscribe system named ProSem, which jointly optimizes processing and dissemination using a novel cost-based optimizer.

  • [CY08] Badrish Chandramouli and Jun Yang. End-to-End Support for Joins in Large-Scale Publish/Subscribe Systems. In Proceedings of the 34th International Conference on Very Large Data Bases (VLDB '08), Auckland, New Zealand, August 2008. [pdf]
  • [CY08demo] Badrish Chandramouli, Jun Yang, Pankaj K. Agarwal, Albert Yu, and Ying Zheng. ProSem: Scalable Wide-Area Publish/Subscribe. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD '08), Vancouver, Canada, June 2008. [pdf]
  • Badrish Chandramouli, Jeff M. Phillips, and Jun Yang. Value-Based Notification Conditions in Large-Scale Publish/Subscribe Systems. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB '07), Vienna, Austria, September 2007. [pdf]
  • Badrish Chandramouli, Junyi Xie, and Jun Yang. On the Database/Network Interface in Large-Scale Publish/Subscribe Systems. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data (SIGMOD '06), Chicago, Illinois, June 2006. [pdf]
  • Badrish Chandramouli. Supporting Better Scalability and Richer Subscription Models in Wide-Area Publish/Subscribe. Technical Report, Duke University (preliminary examination report), July 2006. [pdf]
  • Badrish Chandramouli, Junyi Xie, and Jun Yang. On the Database/Network Interface in Large-Scale Publish/Subscribe Systems. Technical Report, Duke University, November 2005. [pdf]

On Suspending and Resuming Queries

Suppose a long-running analytical query is executing on a database server and has been allocated a large amount of physical memory. A high-priority task comes in and we need to run it immediately with all available resources. We have several choices. We could swap out the old query to disk, but writing out a large execution state may take too much time. Another option is to terminate the old query and restart it after the new task completes, but we would waste all the work already performed by the old query. Yet another alternative is to periodically checkpoint the query during execution, but traditional synchronous checkpointing carries high overhead. In this work, we advocate a database-centric approach to implementing query suspension and resumption, with negligible execution overhead, bounded suspension cost, and efficient resumption. The basic idea is to let each physical query operator perform lightweight checkpointing according to its own semantics, and coordinate asynchronous checkpoints among operators through a novel contracting mechanism. At the time of suspension, we find an optimized suspend plan for the query, which may involve a combination of dumping current state to disk and going back to previous checkpoints. The plan seeks to minimize the suspend/ resume overhead while observing the constraint on suspension time. Our approach requires only small changes to the iterator interface, which we have implemented in the PREDATOR database system. Experiments with our implementation demonstrate significant advantages of our approach over traditional alternatives.

  • Badrish Chandramouli, Christopher N. Bond, Shivnath Babu, and Jun Yang. Query Suspend and Resume. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data (SIGMOD '07), Beijing, China, June 2007. [pdf]
  • Badrish Chandramouli, Christopher N. Bond, Shivnath Babu, and Jun Yang. On Suspending and Resuming Dataflows. In Proceedings of the 23rd International Conference on Data Engineering (ICDE '07), Istanbul, Turkey, April 2007 (poster paper). [pdf] [poster]
  • Badrish Chandramouli, Christopher N. Bond, Shivnath Babu, and Jun Yang. Query Suspend and Resume. Technical Report, Duke University, July 2006. [pdf]

Distributed Approximate Network Querying

As networks continue to grow in size and complexity, distributed network monitoring and resource querying are becoming increasingly difficult and costly. The aim of this work is to design, build, and evaluate a scalable infrastructure for answering queries over distributed measurements, while reducing costs (in terms of both network traffic and query latency) and maximizing precision of results. In this infrastructure, each network node owns a set of numerical measurement values and actively maintains bounds on these values cached at other nodes. We can then answer queries approximately, using bounds from nearby caches to avoid contacting the owners directly. We argue that approximate results are acceptable for our target applications, as long as errors are quantified precisely and reported to the user, and there is a mechanism for the user to obtain results with a specified precision. In this work, we focus on developing efficient and scalable techniques to place, locate, and manage bounded approximate caches across a large network. We have developed and evaluated two approaches: One uses a recursive partitioning of the network space to place caches in a static, controlled manner, while the other uses a locality-aware distributed hash table to place caches in a dynamic and decentralized manner. Our experiments using large-scale network emulation show that our techniques are very effective in reducing query costs while generating an acceptable amount of background traffic; they are also able to exploit various forms of locality that are naturally present in queries, and adapt to volatility of measurements.

  • Badrish Chandramouli, Jun Yang, and Amin Vahdat. Distributed Network Querying with Bounded Approximate Caching. In Proceedings of the 11th International Conference on Database Systems for Advanced Applications (DASFAA '06), Singapore, April 2006. [pdf]
  • Badrish Chandramouli. Distributed Network Querying: Reducing Costs by Providing Approximate Answers. Technical Report, Duke University (MS report), December 2004. [pdf]
  • Badrish Chandramouli, Jun Yang, and Amin Vahdat. QONCH: A System for Distributed Network Querying using Bounded Approximate Caching. Demonstration report, Duke university, November 2004. [pdf]

Selected Past Projects