Vincentius Martin

PhD Candidate, Computer Science Department - Duke University
D229 LSRC, 308 Research Drive, Duke Box 90129, NC 27708

vmartin@cs.duke.edu

I am Computer Science PhD candidate in Duke University. Currently, I am working with Dr. Raluca Gordân with research in computational genomics.

The rapid growth of biological data is leading to a pressing need for time- and space-efficient data processing. My research aims to tackle this problem, focusing on biological data that describes interactions between DNA molecules and gene regulatory proteins called transcription factors (TFs). Specifically, my research combines efficient database techniques for storing and accessing the data, with machine learning methods for learning accurate models. The database techniques allow us to handle the large amount of protein-DNA binding data and genetic mutations data we need to store and process, while the machine learning methods allow us to learn accurate representations of protein-DNA binding preferences. My research also open to the use of a broad range of knowledge such as statistical model, artificial intelligence, and algorithm.

Link to CV.


Education

Duke University

PhD Candidate, Computer Science
Advisor: Dr. Raluca Gordân
Dissertation title: Deciphering the quantitative effect of mutations and cooperativity on transcription factor binding
2016-present

Institut Teknologi Bandung

Bachelor of Engineering, Informatics Engineering / Computer Science
2010-2014

Publications

2021
2019
2017
2016
2014

Research projects

Transcription factor-centric approach to identify non-recurring putative regulatory drivers in cancer

Identification of cancer driver mutations is generally based on patterns of recurrence among tumor samples. However, mutations do not have to be highly recurrent in order to be true drivers. In this project, we developed a new method for analyzing cancer mutations in non-coding genomic regions based on the magnitude of their effects on transcription factor-DNA binding using predictions from QBiC-Pred. By combining the effects of mutations across all regulatory regions of each gene, we identified dozens of genes whose regulation in tumor cells is likely to be significantly perturbed by non-coding mutations.

Modeling cooperative binding of transcription factors to clusters of DNA binding sites

Clusters of transcription factor (TF) binding sites are prevalent in the human genome, i.e., multiple sites located in close proximity to each other. However, it is unclear whether binding sites in a cluster are bound independently or cooperatively. We modeled these interactions by utilizing features derived from the DNA sequences and structures of the cluster sites. Models trained using these features are highly accurate and revealed various mechanisms of cooperative TF binding.

QBiC-Pred web server

QBiC-Pred is a web service that allows users to predict the impact of non-coding mutations to TF-DNA binding through a user-friendly web interface. It uses 6-mer-based linear regression models of TF-DNA binding specificity, i.e., we take all DNA sequences of length 6 as features and use ordinary least squares (OLS) regression to estimate the coefficients for all features. By combining multiprocessing and integer indexing, QBiC-Pred is able to rapidly make predictions on the impact of hundreds of thousands of mutations for over 600 human TFs. The web server can be accessed at: qbic.genome.duke.edu.