Vincentius Martin

PhD Candidate, Computer Science Department - Duke University
D229 LSRC, 308 Research Drive, Duke Box 90129, NC 27708

vmartin@cs.duke.edu

I am Computer Science PhD candidate in Duke University. Currently, I am working with Dr. Raluca Gordân with research in computational genomics.

The rapid growth of biological data is leading to a pressing need for time- and space-efficient data processing. My research aims to tackle this problem, focusing on biological data that describes interactions between DNA molecules and gene regulatory proteins called transcription factors (TFs). Specifically, my research combines efficient database techniques for storing and accessing the data, with machine learning methods for learning accurate models. The database techniques allow us to handle the large amount of protein-DNA binding data and genetic mutations data we need to store and process, while the machine learning methods allow us to learn accurate representations of protein-DNA binding preferences. My research also open to the use of a broad range of knowledge such as statistical model, artificial intelligence, and algorithm.

Link to CV.

Education

Duke University

PhD Candidate, Computer Science

Advisor: Dr. Raluca Gordân

Dissertation title: Deciphering the quantitative effect of mutations and cooperativity on transcription factor binding

2016-present

Institut Teknologi Bandung

Bachelor of Engineering, Informatics Engineering / Computer Science

2010-2014

Publications

2021

High-throughput data and modeling reveal insights into the mechanisms of cooperative DNA-binding by transcription factor proteins
Vincentius Martin, Farica Zhuang, Yuning Zhang, Raluca Gordân
To be submitted December 2021.
Transcription factor-centric approach to identify non- recurring putative regulatory drivers in cancer
Jingkang Zhao, Vincentius Martin, Raluca Gordân
Submited to Recomb 2022
- Github: https://github.com/jz132/cancer-mutations

2019

QBiC-Pred: quantitative predictions of transcription factor binding changes due to sequence variants.
Vincentius Martin, Jingkang Zhao, Ariel Afek, Zachery Mielko, Raluca Gordân
Nucleic Acids Research (NAR)
- QBiC-Pred web server: qbic.genome.duke.edu
- Github: https://github.com/vincentiusmartin/QBiC-Pred

2017

Fending off IoT-hunting attacks at home networks.
Vincentius Martin, Qiang Cao, Theophilus Benson
Cloud-Assisted Networking Workshop (CAN@CoNEXT ’17)
ePrivateeye: to the edge and beyond!
Christopher Streiffer, Animesh Srivastava, Victor Orlikowski, Yesenia Velasco, Vincentius Martin, Nisarg Raval, Ashwin Machanavajjhala, Landon P. Cox
The Second ACM/IEEE Symposium on Edge Computing (SEC ’17)
PBSE: a robust path- based speculative execution for degraded-network tail tolerance in data-parallel framework
Riza O. Suminto, Cesar A. Stuardo, Alexandra Clark, Huan Ke, Tanakorn Leesatapornwongsa, Bo Fu, Daniar H. Kurniawan, Vincentius Martin, Maheswara Rao G. Uma, Haryadi S. Gunawi
2017 ACM Symposium on Cloud Computing (SoCC ’17)

2016

Manylogs: Improved CMR/SMR Disk Bandwidth and Faster Durability with Scattered Logs
Tiratat Patana-anake, Vincentius Martin, Nora Sandler, Cheng Wu, Haryadi S. Gunawi
32nd International Conference on Massive Storage Systems and Technology (MSST ’16)

2014

What Bugs Live in the Cloud? A Study of 3000+ Issues in Cloud Systems
Haryadi S. Gunawi, Mingzhe Hao, Tanakorn Leesatapornwongsa, Tiratat Patana-anake, Thanh Do, Jeffry Adityatama, Kurnia J. Eliazar, Agung Laksono, Jeffrey F. Lukman, Vincentius Martin, Anang D. Satria (authors are listed in institutional and alphabetical order)
2014 ACM Symposium on Cloud Computing (SoCC ’14)

Research projects

Transcription factor-centric approach to identify non-recurring putative regulatory drivers in cancer

Identification of cancer driver mutations is generally based on patterns of recurrence among tumor samples. However, mutations do not have to be highly recurrent in order to be true drivers. In this project, we developed a new method for analyzing cancer mutations in non-coding genomic regions based on the magnitude of their effects on transcription factor-DNA binding using predictions from QBiC-Pred. By combining the effects of mutations across all regulatory regions of each gene, we identified dozens of genes whose regulation in tumor cells is likely to be significantly perturbed by non-coding mutations.

Modeling cooperative binding of transcription factors to clusters of DNA binding sites

Clusters of transcription factor (TF) binding sites are prevalent in the human genome, i.e., multiple sites located in close proximity to each other. However, it is unclear whether binding sites in a cluster are bound independently or cooperatively. We modeled these interactions by utilizing features derived from the DNA sequences and structures of the cluster sites. Models trained using these features are highly accurate and revealed various mechanisms of cooperative TF binding.

QBiC-Pred web server

QBiC-Pred is a web service that allows users to predict the impact of non-coding mutations to TF-DNA binding through a user-friendly web interface. It uses 6-mer-based linear regression models of TF-DNA binding specificity, i.e., we take all DNA sequences of length 6 as features and use ordinary least squares (OLS) regression to estimate the coefficients for all features. By combining multiprocessing and integer indexing, QBiC-Pred is able to rapidly make predictions on the impact of hundreds of thousands of mutations for over 600 human TFs. The web server can be accessed at: qbic.genome.duke.edu.