Synthesizing Linked Data Under Cardinality and Integrity Constraints
The generation of synthetic data is useful in multiple aspects, from testing applications to benchmarking to privacy preservation. Generating the links between relations, subject to cardinality constraints (CCs) and integrity constraints(ICs) is an important aspect of this problem. Given instances of two relations, where one has a foreign key dependence on the other and is missing only its foreign key (𝐹𝐾) values, and two types of constraints: (1) CCs that apply to the join view and (2) ICs that apply to the table with missing 𝐹𝐾 values, our goal is to impute the missing 𝐹𝐾 values such that the constraints are satisfied. We provide a novel framework for the problem based on declarative CCs and ICs. We further show that the problem is NP-hard and propose a novel two-phase solution that guarantees the satisfaction of the ICs, while maintaining low error rates for the CCs.
Shweta Patwa is a fourth year Duke CS PhD student advised by Ashwin Machanavajjhala. Her research interests lie in differential privacy, with a focus on supporting data analysis in statistical databases.