RNA Secondary Structure Prediction with few Homolog
Non coding RNAs (ncRNA), unlike messenger RNAs (mRNA), are directly functional in the cell where they are believed to play a role usually assigned to proteins. Since two decades, a series of new ncRNA families are being discovered, usually coming as a set of putative, sometimes poorly conserved, homologs. The function of these ncRNA in the cell, like for proteins, is related to their three-dimensional shape and an ordinary problem is the computational prediction of this shape (ie. the exact atomic positions) given just the nucleic acid sequence. This hard problem is addressed stepwise by using intermediate levels of description of the molecule. Secondary structure, which is more or less the graph of the intra-molecular base-pairing interactions, provides such an intermediate, topological description of the molecule that happens to be well suited for an algorithmic approach: given a sequence, the set of possible secondary structures that minimize a free energy function (within a realistic, yet simplified, thermodynamical model) can be computed in reasonable time by dynamic programming. But as there is usually no clue to guess which structure in that set is the correct one, it makes sense to use several homologs, when available, and improve the accuracy of the results by seeking the common structure.
The extension of the previous framework to several sequences turns out to be computationally expensive, although still polynomial in space and time, and calls for heuristic adaptations. I will present our algorithmic results, that prove to apply successfully to that arduous context, usually unaffordable for other methods.