Causal Inference & Graphical Models: From Missing Data to the Future of Artificial Intelligence
The remarkable progress in AI and machine learning owe much to the availability of massive amounts of data, and where there is data, there is missingness. The rich tend to conceal their income, smokers hide their habit, and patients drop out from clinical trials.
The bulk of literature on missing data employs data-centric routines as opposed to process-centric methodology and relies on assumptions that are both opaque and untestable and (e.g.: Missing At Random (MAR), Rubin 1976). As a result, this area of research is wanting in tools to encode knowledge about the underlying data generating process, methods to test this knowledge and procedures to decide if and how quantities of interest are estimable from the available data.
I address these deficiencies by using a graphical representation called "Missingness Graph" which portrays the causal mechanisms responsible for missingness. Using this representation, I define the notion of recoverability, i.e., deciding whether there exists a consistent estimator for a given quantity of interest such as joint distributions, conditional distributions and causal effects. The resultant methods apply to all types of missing data including the notorious and relatively unexplored NMAR (Not Missing At Random) category. I further address the question of testability i.e. if and how an assumed model can be subjected to statistical tests, considering the missingness in the data.
Viewing the missing data problem from a causal perspective has ushered in several notable surprises. These include recoverability when variables are causes of their own missingness, testability of MAR models, alternatives to iterative procedures such as Expectation Maximization (EM) algorithm and the indispensability of causal assumptions for handling missing data problems.
Building on the power and ubiquity of graphical causal models, my agenda for the coming years is to explore other problem areas that have impeded progress in AI and machine learning. These include algorithms to bridge the gap between recoverability and estimation, causal discovery algorithms, robust solutions for problems in AI Ethics such as fairness and privacy and working towards developing human-level AI. I am particularly intrigued by the prospects of using intervention and counterfactual logic to formulate and solve these problems. I will outline some of these wide-open frontiers in my talk.
Karthika Mohan is a postdoctoral scholar at the computer science department in University of California, Berkeley. Karthika received her PhD in Computer Science (Artificial Intelligence) from University of California, Los Angeles (UCLA) where she was advised by Judea Pearl. Her research is of an interdisciplinary nature and her areas of interest include Causal Inference, Graphical Models and AI Safety. She was awarded the Google Outstanding Graduate Research Award, 2017 which is a UCLA Commencement Award. Currently she serves on the editorial board of the Journal of Causal Inference.