|Search Duke CS||
The first work addresses the problem of accurately estimating transcript abundance from RNA-sequencing (RNA-Seq) data. Transcript abundance is the relative measure of transcript copy number in cells. It is a fundamental quantity in biology and has a huge impact on human health: studies have shown that transcript abundances are often altered in disease conditions. To extract abundance “signal” from RNA-Seq data, we developed RSEM, one of the most accurate transcript abundance estimation tools, by utilizing modern statistical learning techniques. RSEM has been extensively used around the world since its release: RSEM papers are cited over 2,300 times and RSEM is used in nationwide consortium projects such as TCGA and ENCODE.
The second work introduces PROBer, a statistical learning software for accurate epitranscriptomic mark detection. Epitranscriptomics, also known as RNA epigenetics, is a new field focusing on the study of RNA structure, RNA modification and RNA-protein interaction at the transcriptome scale. These three aspects are ladders to understanding the mechanism of alternative splicing, of which the disturbance often results in severe diseases. Epitranscriptomic sequencing data often contain background noise and ambiguous position information, which jointly influence the detection accuracy of epitranscriptomic marks. Therefore, we need to simultaneously solve the problems of signal separation and ambiguity resolving. Existing analyzing methods heavily rely on ad hoc heuristics, which could not handle these two problems well. PROBer incorporates both background noise and position information into its generative probabilistic model, and learns them from data automatically. We compare PROBer with the existing methods on detecting epitranscriptomic marks. Results on both simulated and real data show that PROBer outperforms them all.
In recent years, epitranscriptomics has become a hot research topic. Nature Methods recently selected epitranscriptome analysis as the method of the year. In addition, several newly approved grants from NIH indicate a big funding source in the future. In the last part of my talk, I will discuss how my future works would fit in this emerging trend of epitranscriptomics research.