Data-Intensive Systems for the Social Sciences
The social sciences are crucial for deciding billions in spending, and yet are often starved for data and badly underserved by modern computational tools. Building data-intensive systems for social science workloads holds the promise of enabling exciting discoveries in both computational and domain-specific fields, while also making an outsized real-world impact.
This talk will describe two data systems for the social sciences. The first is RaccoonDB, a declarative nowcasting data management system, which enables users to predict real-world time-series phenomena from social media signals. RaccoonDB’s novel query optimization methods allow it to generate useful social science predictions 123 times faster than competing systems, using just 10% of the computational resources. When applied to unemployment phenomena, the system yields predictions with accuracy that is comparable to predictions from real-world economists.
The second system is an information extraction system designed to analyze online text and help law enforcement officers identify potential human trafficking victims. This system has been successfully applied to real-world cases. In addition, the resulting extracted dataset enables several novel social science findings about behavior in an illicit and often opaque market.
Michael Cafarella is an Associate Professor of Computer Science and Engineering at the University of Michigan. His research interests include databases, information extraction, data integration, and data mining. He has published extensively in venues such as SIGMOD, VLDB, and elsewhere. Mike received his PhD from the University of Washington in 2009 with advisors Oren Etzioni and Dan Suciu. His academic awards include the NSF CAREER award and the Sloan Research Fellowship. In addition to his academic work, Mike cofounded (with Doug Cutting) the Hadoop open-source project. In 2015 he cofounded (with Chris Re and Feng Niu) Lattice Data, Inc., which is now part of Apple.