Nucleosome occupancy information improves de novo motif discovery
Leelavati Narlikar, Raluca Gordân, Alexander J. Hartemink

Abstract

Identification of transcription factor binding sites on a genome-wide scale is an important part of understanding transcriptional regulatory processes in the cell. Unfortunately, these binding sites are short and often degenerate, posing a significant statistical challenge. Many matches to known transcription factor binding sites occur in the genome by chance but are not functional binding sites. Chromatin structure is known to play an important role in guiding transcription factors to their functional binding sites. In particular, it has been shown that active regulatory regions are usually deprived of nucleosomes, thereby enabling transcription factors to bind DNA in those regions [1]. In this paper, we describe a novel algorithm which uses information about nucleosome occupancy in yeast as an informative prior to more accurately discover motifs in sequence-sets identified by ChIP-chip. In the algorithm, we incorporate a prior over locations within DNA sequences, based on predictions of nucleosome occupancy from a recently published computational model [2]. We show that when the nucleosome prior is used in a discriminative setting, the algorithm performs admirably. It identifies the correct motif in 50% more cases than an algorithm incorporating the commonly used uniform, non-informative prior.


Supplementary files

Top scoring motifs learned by PRI-U, PRI-N, PRI-D, PRI-DN, AlignACE, MEME, MDscan, MEME_c, CONVERGE, and Kellis.

Comparison of above motifs with literature consensus. Note that while looking for matches, we consider both the literature consensus, and its reverse complement.