ProSem: Internet-Scale Publish/Subscribe Unifying Data Processing and Dissemination
Supported by National Science Foundation Award IIS-0713498, III-COR: Scalable Publish/Subscribe: Unifying Data Processing and Dissemination (abstract)
Principle investigators: Jun Yang and Pankaj K. Agarwal, Duke University
People
- Pankaj K. Agarwal (Faculty)
- Badrish Chandramouli (PhD Student)
- Sharathkumar Raghvendra (PhD Student)
- Jun Yang (Faculty)
- Albert Yu (PhD Student)
- Ying Zheng (PhD Student)
Introduction
The Digital Age has brought about unprecedented growth in the amount of data being generated, the number of data consumers, and the diversity of their interests and locations. Traditionally, users poll sources for information, but for many applications, polling is hardly scalable and may miss important events. The alternative offered by publish/subscribe systems is to push notifications to users with matching interests. This approach suits many applications, ranging from personal, commercial, medical, to environmental, military, and security. However, traditional publish/subscribe systems are becoming inadequate for advanced applications, where users want to receive information that has been filtered, joined, and summarized, and only when certain conditions are met.
This project aims at building a next-generation publish/subscribe system to face the new challenges. We are developing an end-to-end solution consisting of techniques from subscription processing and indexing to dissemination network design, which work together to support efficient and powerful subscription functionalities, allowing users to control precisely what they want and when they want it.
One main feature distinguishing our approach from previous work is joint consideration of subscription processing and notification dissemination. Traditionally, these problems are considered separately by database and networking communities. However, there exists a wide spectrum of interesting alternatives for interfacing processing with dissemination. We propose a promising novel approach called reforumulation that allows complex, stateful subscriptions to be handled by simple, stateless dissemination mechanisms, with a clean system design that is easy to implement and scale. A cost-based optimizer, inspired by database query optimization, chooses the best processing and dissemination strategies jointly and dynamically.
Besides system building, this project tackles many new algorithmic challenges, including, e.g., scalably processing a large number of complex subscriptions; exploiting event and subscription characteristics to combat worst-case complexity; balancing semantic similarity and network proximity in dissemination network design; and efficiently maintaining statistics for high-dimensional events and subscriptions.
Progress
In the first year of this project, we we made progress on the following specific research problems: (1) ProSem system development and demonstration; (2) scalable processing and dissemination of select-join subscriptions; (3) dissemination network design for wide-area publish/subscribe; (4) scalable processing and dissemination of value-based notification conditions; (5) input-sensitive scalable continuous join query processing. A detailed description of our contributions can be found below in our 2007-2008 project report.
A number of our contributions have been published in premier conferences: ISAAC 2005, DASFAA 2006, SIGMOD 2006, VLDB 2006, VLDB 2007, SIGMOD 2008, and VLDB 2008. For detailed descriptions of these contributions, please refer to our project reports and publications below.
- Project report (PDF) for academic year 2007-2008.
Publications
- Badrish Chandramouli and Jun Yang. "End-to-End Support for Joins in Large-Scale Publish/Subscribe Systems." In Proceedings of the 34th International Conference on Very Large Data Bases (VLDB '08), Auckland, New Zealand, August 2008. Acceptance rate: 16.5%.
Available for download: paper. - Badrish Chandramouli, Jun Yang, Pankaj K. Agarwal, Albert Yu, and Ying Zheng. "ProSem: Scalable Wide-Area Publish/Subscribe." In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD '08), Vancouver, Canada, June 2008. System demonstration description. Acceptance rate: 31.9%.
Available for download: paper. - Badrish Chandramouli, Jeff M. Phillips, and Jun Yang. "Value-Based Notification Conditions in Large-Scale Publish/Subscribe Systems." In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB '07), Vienna, Austria, September 2007. Acceptance rate: 16.4%.
Available for download: paper. - Pankaj K. Agarwal, Junyi Xie, Jun Yang, and Hai Yu. "Scalable Continuous Query Processing by Tracking Hotspots." In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB '06), Seoul, Korea, September 2006. Acceptance rate: 13.8%.
Available for download: paper and technical report. - Badrish Chandramouli, Junyi Xie, and Jun Yang. "On the Database/Network Interface in Large-Scale Publish/Subscribe Systems." In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data (SIGMOD '06), Chicago, Illinois, USA, June 2006. Acceptance rate: 13.0%.
Available for download: paper and technical report. - Badrish Chandramouli, Jun Yang, and Amin Vahdat. "Distributed Network Querying with Bounded Approximate Caching." In Proceedings of the 11th International Conference on Database Systems for Advanced Applications (DASFAA '06), Singapore, April 2006. Acceptance rate: 24.5%.
Available for download: paper and technical report. - Pankaj K. Agarwal, Junyi Xie, Jun Yang, and Hai Yu. "Monitoring Continuous Band-Join Queries over Dynamic Data." In Proceedings of the 16th Annual International Symposium on Algorithms and Computation (ISAAC '05), Sanya, Hainan, China, December 2005. Acceptance rate: unknown.
Available for download: paper.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
