ProSem: Internet-Scale Publish/Subscribe Unifying Data Processing and Dissemination
Supported by National Science Foundation Award IIS-0713498, III-COR: Scalable Publish/Subscribe: Unifying Data Processing and Dissemination (abstract)
Principle investigators: Jun Yang and Pankaj K. Agarwal, Duke University
People
- Pankaj K. Agarwal (Faculty)
- Shashidhara Ganjugunte (PhD Student)
- Sharathkumar Raghvendra (PhD Student)
- Risi Thonangi (PhD Student)
- Jun Yang (Faculty)
- Albert Yu (PhD Student)
Alumni
- Badrish Chandramouli (PhD, 2008; now at Microsoft Research)
Introduction
The Digital Age has brought about unprecedented growth in the amount of data being generated, the number of data consumers, and the diversity of their interests and locations. Traditionally, users poll sources for information, but for many applications, polling is hardly scalable and may miss important events. The alternative offered by publish/subscribe systems is to push notifications to users with matching interests. This approach suits many applications, ranging from personal, commercial, medical, to environmental, military, and security. However, traditional publish/subscribe systems are becoming inadequate for advanced applications, where users want to receive information that has been filtered, joined, and summarized, and only when certain conditions are met.
This project aims at building a next-generation publish/subscribe system to face the new challenges. We are developing an end-to-end solution consisting of techniques from subscription processing and indexing to dissemination network design, which work together to support efficient and powerful subscription functionalities, allowing users to control precisely what they want and when they want it.
One main feature distinguishing our approach from previous work is joint consideration of subscription processing and notification dissemination. Traditionally, these problems are considered separately by database and networking communities. However, there exists a wide spectrum of interesting alternatives for interfacing processing with dissemination. We propose a promising novel approach called reforumulation that allows complex, stateful subscriptions to be handled by simple, stateless dissemination mechanisms, with a clean system design that is easy to implement and scale. A cost-based optimizer, inspired by database query optimization, chooses the best processing and dissemination strategies jointly and dynamically.
Besides system building, this project tackles many new algorithmic challenges, including, e.g., scalably processing a large number of complex subscriptions; exploiting event and subscription characteristics to combat worst-case complexity; balancing semantic similarity and network proximity in dissemination network design; and efficiently maintaining statistics for high-dimensional events and subscriptions.
Progress
In the first year of this project, we made progress on the following specific research problems: (1) ProSem system development and demonstration; (2) scalable processing and dissemination of select-join subscriptions; (3) dissemination network design for wide-area publish/subscribe; (4) scalable processing and dissemination of value-based notification conditions; (5) input-sensitive scalable continuous join query processing; (6) querying uncertain data; (7) maintaining data summaries. A detailed description of our contributions can be found below in our 2007-2008 project report.
In the second year of this project, we continued to make progress on (1), (3), (5), (6), and (7). In addition, we have worked on the following problems: (8) a generator for wide-area content-based publish/subscribe workloads, which we are making public; (9) extending our ideas and framework beyond relational data and queries to information retrieval and extraction context; (10) data aggregation and scheduling over a network. A detailed description of our contributions can be found below in our 2008-2009 project report.
A number of our contributions have been published in premier conferences: ISAAC 2005, DASFAA 2006, SIGMOD 2006, SoCG 2006, VLDB 2006, SoCG 2007, VLDB 2007, ACM-GIS 2007, SIGMOD 2008, SoCG 2008, VLDB 2008, ICDE 2009, DCOSS 2009, SoCG 2009, PODS 2009, and SIGMOD 2009. For detailed descriptions of these contributions, please refer to our project reports and publications below.
Publications
- Pankaj K. Agarwal, Junyi Xie, Jun Yang, and Hai Yu. "Input-Sensitive Scalable Continuous Join Query Processing." ACM Transactions on Database Systems (TODS), 34(3):1-41, August 2009. Results in this paper subsume those in the paper titled "Scalable Continuous Query Processing by Tracking Hotspots" in the 32nd International Conference on Very Large Data Bases
(VLDB '06)
.
Available for download: paper. - Albert Yu, Pankaj K. Agarwal, and Jun Yang. "Generating Wide-Area Content-Based Publish/Subscribe Workloads." In Proceedings of the 5th International Workshop on Networking Meets Databases (NetDB '09), Big Sky, Montana, USA, October 2009. Acceptance rate: unknown.
Available for download: paper. - Fei Chen, Byron Gao, AnHai Doan, Jun Yang, and Raghu Ramakrishnan. "Optimizing Complex Extraction Programs over Evolving Text Data." In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD '09), Providence, Rhode Island, USA, June 2009. Acceptance rate: 15.9%.
- Risi Thonangi, Hao He, AnHai Doan, Haixun Wang, and Jun Yang. "Weighted Proximity Best-Joins for Information Retrieval." In Proceedings of the 25th International Conference on Data Engineering (ICDE '09), Shanghai, China, March 2009. Acceptance rate: 16.8%.
Available for download: paper. - Badrish Chandramouli and Jun Yang. "End-to-End Support for Joins in Large-Scale Publish/Subscribe Systems." In Proceedings of the 34th International Conference on Very Large Data Bases (VLDB '08), Auckland, New Zealand, August 2008. Acceptance rate: 16.5%.
Available for download: paper. - Badrish Chandramouli, Jun Yang, Pankaj K. Agarwal, Albert Yu, and Ying Zheng. "ProSem: Scalable Wide-Area Publish/Subscribe." In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD '08), Vancouver, Canada, June 2008. System demonstration description. Acceptance rate: 31.9%.
Available for download: paper. - Badrish Chandramouli, Jeff M. Phillips, and Jun Yang. "Value-Based Notification Conditions in Large-Scale Publish/Subscribe Systems." In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB '07), Vienna, Austria, September 2007. Acceptance rate: 16.4%.
Available for download: paper. - Pankaj K. Agarwal, Junyi Xie, Jun Yang, and Hai Yu. "Scalable Continuous Query Processing by Tracking Hotspots." In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB '06), Seoul, Korea, September 2006. Acceptance rate: 13.8%. Results in this paper are subsumed by those in the paper titled "Input-Sensitive Scalable Continuous Join Query Processing"
in ACM Transactions on Database Systems
(TODS)
2009.
Available for download: paper and technical report. - Badrish Chandramouli, Junyi Xie, and Jun Yang. "On the Database/Network Interface in Large-Scale Publish/Subscribe Systems." In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data (SIGMOD '06), Chicago, Illinois, USA, June 2006. Acceptance rate: 13.0%.
Available for download: paper and technical report. - Badrish Chandramouli, Jun Yang, and Amin Vahdat. "Distributed Network Querying with Bounded Approximate Caching." In Proceedings of the 11th International Conference on Database Systems for Advanced Applications (DASFAA '06), Singapore, April 2006. Acceptance rate: 24.5%.
Available for download: paper and technical report. - Pankaj K. Agarwal, Junyi Xie, Jun Yang, and Hai Yu. "Monitoring Continuous Band-Join Queries over Dynamic Data." In Proceedings of the 16th Annual International Symposium on Algorithms and Computation (ISAAC '05), Sanya, Hainan, China, December 2005. Acceptance rate: unknown.
Available for download: paper.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
