Scalable Probabilistic Inference for Large-Scale Genomic Data
The cost of genome sequencing has decreased by over 100,000 fold over the last decade. This genomic revolution is now enabling us to measure how our genomes vary at millions of positions across millions of individuals opening up the possibility of answering fundamental questions in human genetics. I will describe our work at the intersection of statistics, computer science and genomics aimed at leveraging these large-scale genomic datasets to answer questions such as how human populations evolved and what are the genes underlying diseases. We will describe two techniques that are commonly used in the analysis of human genetic data: principal components analysis (PCA) and variance components analysis. With the advent of large-scale datasets of genetic variation, there is a need for methods that can perform these analyses with scalable computational and memory requirements. Leveraging randomized method-of-moments estimators and the structure of genetic variation data, we obtain sub-linear time algorithms for these problems. These algorithms allow us to efficiently estimate variance components as well as top principal components, for example, in less than an hour on genome-wide genetic variation datasets from a million individuals. Applying these methods to about half a million individuals from the UK, we obtain novel biological insights.
Sriram Sankararaman is an assistant professor in the Departments of Computer Science, Human Genetics, and Computational Medicine at UCLA. His research interests lie at the interface of computer science, statistics and biology. He is interested in developing statistical machine learning algorithms to make sense of large-scale genomic data and in using these tools to understand the interplay between evolution, our genomes and traits. He received a B.Tech. in Computer Science from the Indian Institute of Technology, Madras, a Ph.D. in Computer Science from UC Berkeley and was a post-doctoral fellow in Harvard Medical School before joining UCLA. He is a recipient of the Alfred P. Sloan Foundation fellowship (2017), Okawa Foundation grant (2017), the UCLA Hellman fellowship (2017), the NIH Pathway to Independence Award (2014), a Simons Research fellowship (2014), and a Harvard Science of the Human Past fellowship (2012) as well as the Northrop-Grumman Excellence in Teaching Award at UCLA (2019).