Data Provenance and its Applications
Data Provenance, also referred to as Data Lineage, is metadata that describes from where a digital artifact came. People have argued that such metadata is useful for myriad applications such as reproducibility, forensic analysis, intrusion detection, data retention, regulatory compliance, and more. Unfortunately, the vast majority of work in the area focuses on standardization and collection, not applications. As a result, adoption of provenance in industry has been practically non existent.
I'll present a short background and history of research on data provenance followed by a discussion of some real applications that we've developed (are developing), some challenges in building powerful provenance-based applications, and speculation about avenues of further research.
Margo Seltzer is Canada 150 Research Chair in Computer Systems and the Cheriton Family chair in Computer Science at the University of British Columbia. Her research interests are in systems, construed quite broadly: systems for capturing and accessing data provenance, file systems, databases, transaction processing systems, storage and analysis of graph structured data, and systems for constructing optimal and interpretable machine learning models.
She is the author of several widely-used software packages including database and transaction libraries and the 4.4BSD log-structured file system. Dr. Seltzer was a co-founder and CTO of Sleepycat Software, the makers of Berkeley DB, the recipient of the 2021 ACM Software Sytems award and the 2020 ACM SIGMOD Systems Award.
She serves on the Computer Science and Telecommunications Board (CSTB) of the (US) National Academies. She is a past chair and vice-chair of the Computer Science Committee of the National Academy of Engineering and a past President of the USENIX Assocation. She served as the USENIX representative to the Computing Research Association Board of Directors and on the Computing Community Consortium.
She is a member of the National Academy of Engineering and the American Academy of Arts and Sciences, a Sloan Foundation Fellow in Computer Science, an ACM Fellow, a Bunting Fellow, and was the recipient of the 1996 Radcliffe Junior Faculty Fellowship.