Computer Science Research Profile:
Computational Exploration of Cells and Genomes

From the Fall 2011 issue of Threads

Professor Alex Hartemink

Alex Hartemink remembers being turned off by biology in high school.

These days, however, the Alexander F. Hehmeyer Associate Professor’s research is devoted to exploring the wonders of biology through the use of computational and statistical methods. He works in the area of computational systems biology, trying to better understand how various molecular parts work together to execute the basic processes of cells.

“The impression you get in high school is that everything in biology is known and just needs to be memorized; there’s no wonder left. But the reality is that we know only a tiny fraction of how living systems really work. It’s still a huge mystery,” he said. “Everything in the living world is comprised of cells, and yet we know so little about how these wonderfully intricate systems operate in precise detail.”

When Hartemink joined the Duke faculty in 2001, returning to the university where he had earned his undergraduate degrees, he spent a good portion of his time working with data of clinical relevance, but today he focuses most of his research effort on basic science.

“At this point, I’m motivated primarily by scientific curiosity,” he said. “How does the cell work? How does the genome copy itself faithfully, serving not only as the set of blueprints for making proteins but also as the substrate for how they’re all regulated? That’s an incredibly fascinating question I’d like to contribute to answering.”

Hartemink’s group is interested in more precisely understanding exactly this sort of genome regulation. Using Bayesian statistical methods, they write software to sift through large data sets now available and learn statistical models consistent with all the data.

“When it comes to modeling, I try to encourage students to understand the application domain well enough that the models we fit to the data will reveal something insightful about the domain, which in our case is biology,” Hartemink said. “I think that’s important. I don’t want to develop elegant models that turn out to be irrelevant. I hope the models we develop will advance our collective understanding of biology.”

One of the more significant results of their work was to find evidence that required the field to rethink how cells regulate gene expression as they undertake the cell division cycle. Working closely with Steve Haase, associate professor of biology, they showed that a particular set of molecules — called cyclin/CDKs and long believed to be centrally responsible for controlling how cells express genes over the course of the cell division cycle — aren’t the only control mechanisms in this process.

“Essentially, the experiment was designed to take away these key molecular players and see what happened; everything was supposed to stop, but it didn’t,” Hartemink said. “Much of it kept working.”

Collaborating with Randy Jirtle, professor of radiation oncology, his group also discovered new imprinted genes, critical in normal embryonic development and in the development of certain diseases. Using machine learning, they developed a computational technique to predict which of the 20,000 genes in a mammalian genome had a high probability of being imprinted. With lists of 40 to 50 genes known to be imprinted and 100 to 120 genes strongly believed to not be imprinted, they learned a function to distinguish the two groups. Applying the function to all the genes in the mouse and human genomes yielded lists prioritizing the genes most likely to be imprinted. The Jirtle lab has validated a number of genes from the lists, and other labs have used the lists as well.

“Recently, we’ve been doing less research with these kinds of direct medical applications. Now we’re focusing more on more fundamental questions in molecular and cellular systems biology,” Hartemink said. “In particular, we’re trying to understand how the cell goes through its division cycle correctly, how the replication and transcription of the genome is regulated, and how particular proteins such as transcription factors enact this regulation. All these things — to the extent that we come to understand them better — will eventually lead to improvements in how we diagnose and treat various kinds of diseases.

“Those applications are a few steps beyond the questions that we’re immediately addressing, but they’ll certainly emerge as we increase our understanding of the basic biology.”

Return to Reseach Profiles home page