Query Answering in Multi-Relational Databases under Differential Privacy
Data collection has become a staple of both our digital and “off-line” activities. Government agencies, medical institutions, Internet companies, and academic institutions are among the main actors that collect and store users’ data. Analysis and sharing of this data is paramount in our increasingly data-driven world.
Data sharing provides a large positive societal value; however, it does not come cost free: data sharing is at fundamental odds with individuals’ privacy. As a result, data privacy has become a major research area, with differential privacy emerging as the de-facto data privacy framework. To mask the presence of any individual in the database, differentially private algorithms usually add noise to data releases. This noise is calibrated by the so called “privacy budget”, a parameter that quantifies the privacy loss allowed. One major shortcoming of the definition and much of the supporting literature applies to flat tables. Moreover, there is no system that permits accurate differentially private answering of SQL queries while imposing a fixed privacy loss across all queries posed by the analyst. Even for the simpler problem of answering linear queries on a single flat table, the academic landscape is clouded. Many of the proposed algorithms have functionality dependent on the sensitive data and for which a closed form error analysis is unknown.
In this thesis, we present PrivSQL, a first of its kind end-to-end differentially private relational database system. PrivSQL allows analysts to query a standard relational database using a rich class of SQL queries. PrivSQL enables the data owner to flexibly define the privacy semantics over the schema and provides a fixed privacy loss across all queries submitted by the analyst. We also propose Pythia, a differentially private meta-algorithm that can be used to generate a single private synopsis. More specifically, Pythia addresses the problem of algorithm selection for answering linear counting queries on a single relation.