Programming Statistical Machine Learning with High-Level Knowledge
Machine learning is fundamentally changing how software is developed. Rather than program behavior directly, many developers now curate training data and engineer features, but the process is slow, laborious, and expensive. In this talk I will describe two multi-year projects to study how high-level knowledge can be programmed more directly into statistical machine learning models. The resulting prototypes are used in dozens of major technology companies and research labs, and in collaboration with government agencies like the U.S. Department of Veterans Affairs and U.S. Food and Drug Administration.
The first project is Snorkel, a framework for training statistical models with multiple user-written rules instead of hand-labeled training data. This alternative supervision paradigm raises new questions in statistical machine learning, such as how to learn from noisy sources that can have rich dependency structures like correlations, and how to estimate these structures fast enough for interactive development. Snorkel powers applications, such as reading electronic health records, that otherwise would not admit a learning approach because of the difficulty in curating training data.
The second project is probabilistic soft logic (PSL), a probabilistic programming language for building large-scale statistical models over structured data like biological and social networks using logical rules. PSL scales up structured inference based on a new equivalence result among seemingly distinct convex relaxation techniques for combinatorial optimization. By enabling structured, statistical inference at scale, PSL unlocks new modeling techniques in domains such as bioinformatics and knowledge base construction.
Stephen Bach is a postdoctoral scholar in the computer science department at Stanford University, advised by Christopher Ré. He received his Ph.D. in computer science from the University of Maryland, where he was advised by Lise Getoor. His research focuses on statistical machine learning methods that exploit high-level knowledge, through projects like Snorkel (snorkel.stanford.edu) and probabilistic soft logic (psl.linqs.org). Stephen's thesis on probabilistic soft logic was recognized with the University of Maryland's Larry S. Davis Doctoral Dissertation Award.