Perturbation Analysis of Database Queries
Data-driven decision making plays a dominant role across all domains, from health, business, government, to sports. These data-driven decisions are often ad-hoc and resource-intensive: a bank has to compare and analyze all users, sporting events might use previous events to estimate an acceptable ticket sales rate. In this dissertation, I describe efficient methods for optimizing complex analytic queries.
I begin with a discussion of modeling certain complex queries as perturbation analysis, where a same query template is instantiated and evaluated with a large number of different parameter settings. I then show how to tackle this problem from three distinct angles: with parallel/distributed execution, with database query optimization and processing, and with approximation methods. For each distinct angle, I provide empirical results that show the effectiveness of our techniques for perturbation analysis, and how they benefit a wide range of analytic queries in diverse settings.