Cover Article

Systematic solution to homoöligomeric structures determined by NMR

by J. Martin, P. Zhou, and B. R. Donald*.
Proteins 2015; 83(4): 651-661.
doi: 10.1002/prot.24768
*Corresponding author.

Cover Figure (left; higher resolution on right).
Caption: Systematic Search for Diacylglycerol Kinase Structure. We present a general method for structure determination of protein homoöligomers and demonstrate the method on Diacylglycerol kinase (DAGK). We conclude that the differences in the published NMR and crystal structures are due to limitations of current NMR structure determination methodology. We overcame these limitations by using a new Fold-Operator Theory to systematically search the space of folds and predict distinct fold topologies. The image illustrates the concept of a large number of different folds of DAGK, which all plausibly satisfy the NMR data. The toroidal configuration space illustrates how Fold-Operator Theory obtains a systematic search algorithm over the different possible folds. We report 7 new folds that are topologically distinct, and differ by up to 12 Å (transmembrane helix backbone RMSD) from either the previously published NMR structure or the crystal structure. Image created for this paper by Lei Chen and Yan Liang (L2Molecule.com).

Abstract. Protein structure determination by NMR has predominantly relied on simulated annealing-based conformational search for a converged fold using primarily distance constraints, including constraints derived from nuclear Overhauser effects (NOEs), paramagnetic relaxation enhancement (PRE), and cysteine crosslinkings. Although there is no guarantee that the converged fold represents the global minimum of the conformational space, it is generally accepted that good convergence is synonymous to the global minimum. Here, we show such a criterion breaks down in the presence of large numbers of ambiguous constraints from NMR experiments on homoöligomeric protein complexes. A systematic evaluation of the conformational solutions that satisfy the NMR constraints of a trimeric membrane protein, Diacylglycerol kinase (DAGK), reveals 9 distinct folds, including the reported NMR and crystal structures. This result highlights the fundamental limitation of global fold determination for homoöligomeric proteins using ambiguous distance constraints and provides a systematic solution for exhaustive enumeration of all satisfying solutions.

Summary.
Simulated annealing is a primary method for structure determination of proteins by nuclear magnetic resonance (NMR) spectroscopy. NMR restraints and biophysical principles are encoded into an energy function whose minimization results in models of the protein structure that satisfy the restraints. If the method consistently returns similar structures that adequately satisfy the restraints, the structural ensemble is considered well-converged and the structure determination successful, although the low restraint violation and convergence does not necessarily mean the structure is accurate. The main strength of simulated annealing is its ability to transform a coarse structural model into a more refined structure with improved restraint satisfaction. Where the method falls short is its inability to exhaustively sample topologically distinct structural models. Therefore, it can become trapped in the local minima of the energy landscape, thus missing the genuine fold(s) with similar or lower energies. Further complicating the situation, even if the global minimum structure of the energy function could be obtained, small inaccuracies in the energy function (e.g. due to approximation of complex physical phenomena or misinterpretation of even a few experimental distance constraints) could cause a genuine fold to be incorrectly ranked with a higher energy than the erroneous folds. Although such a situation is considered rare when all distance constraints are uniquely assigned, the odds increase significantly in the presence of ambiguous distance restraints for structure determination of homoöligomeric protein complexes.

Ambiguous distance restraints (ADRs) refer to distance information (such as NOEs) that cannot be uniquely attributed to a single pair of atoms. Since the chemical shifts of equivalent atoms in all subunits in a homoöligomeric complex are identical and thus indistinguishable, ADRs are unavoidable for distance measurements in trimers and higher-order homoöligomers. We refer to this phenomenon as subunit ambiguity. For dimers, separating intra- vs inter-subunit NOEs using X-filtered NOESY is sufficient to resolve subunit ambiguity. For trimers and higher-order oligomers, even after a distance restraint has been classified as inter-subunit, it still has at least two possible assignments and is still ambiguous. ADRs consider degenerate atom pairs by using an average function derived from a mean field approximation. Although it has been demonstrated that genuine interactions can be extracted from ADRs, these methods are prone to becoming trapped in local minima since they rely heavily on the initial fold to remove assignment ambiguity. The energy landscapes for homoöligomers contain a large number of minima with similarly low energy, so when simulated annealing methods using ADRs become trapped in local minima, these methods can fail to report satisfying folds from other minima.

This situation is further exacerbated in the case of homoöligomeric membrane proteins, for which dense restraint collection is often impractical. In the case of Diacylglycerol Kinase from Escherichia coli (henceforth, simply DAGK), a membrane-associated homo-trimer, two different structures have been published. The solution NMR structure of DAGK, determined using ambiguously-assigned distance restraints, possesses a domain-swapped subunit interface, while the crystal structure has a subunit with a more compact conformation and without domain-swapping.

Here we show that the difference between the two structures is due to the local minimum limitations of current methodology for NMR structure determination. We demonstrate that this limitation can be mitigated by searching over topologically distinct folds using a systematic approach called Fold-Operator Theory. Once an initial satisfying fold is discovered, mathematical operators transform the fold into alternate folds. The operators define a group action on the configuration space of protein folds. These alternative folds can be subsequently refined using traditional simulated annealing methods and evaluated for restraint satisfaction. Using this systematic approach, we found 48 distinct folds of DAGK, among which 9, including the published NMR and crystal folds, upon energy minimization, satisfied experimental restraints.

Significance.
We have demonstrated our method on DAGK, showing how to find a remarkable variety of satisfying folds, but the method can also be applied to other homoöligomeric proteins where ambiguous restraints necessarily hinder structure determination with simulated annealing.

In some cases, structures designed from one fold changed to another fold during refinement. There are eight such switches in total, which are shown in our paper. When viewed as a dynamical system, the network of fold switches has two prominent attractors. One is at fold O (the NMR fold) and the other is at fold B, which is not related to any published structure. See the blue letters in our paper to find the names of the folds. Six out of the top seven structures by Xplor total energy and six out of the top seven structures by RMS violation index were either seeded from, or switched to, one of these two attractor folds.

Surprisingly, the best fold by Xplor total energy was neither the fold of the NMR structure nor the fold of the crystal structure. Fold B has the lowest Xplor total energy, and the second lowest RMS violation index. It is topologically distinct from both the NMR and the crystal folds and its three refined structures differ by 12.31-12.87 Å transmembrane helical backbone RMSD from the published NMR structure and by 12.77-12.83 Å from the published crystal structure. It also satisfies different subunit assignments of the intermolecular distance restraints than either published structure, which shows fold-operator theory was able to find previously unknown solutions to the restraint satisfaction problem for DAGK.

It is not yet know whether this new putative fold B has biological significance for DAGK. However, it must be emphasized that currently, based on all NMR measurements to date, (1) fold B is vastly different from the published structures, (2) it cannot be excluded as a possible structure, and, moreover, (3) it fits the NMR restraints as well or better than the two published folds.

Conclusions.
We have presented a general method for structure determination of protein homoöligomers and demonstrated the method on DAGK. We conclude that the differences in the published NMR and crystal structures are due to limitations of current NMR structure determination methodology. We overcame these limitations by using a new fold-operator theory to explicitly search the space of folds and predict distinct fold topologies for further investigation. These folds were used to reduce (and in some cases eliminate) ambiguity in restraint assignments which lessened the difficulty of subsequent refinement of seed structures in Xplor-NIH. By explicitly performing a search over topologically distinct folds, we avoided the implicit fold search performed by local minimization methods which can become trapped in local minima and therefore fail to report satisfying solutions. Using explicit fold-space search methods to address the limitations of local minimization techniques such as simulated annealing enables robust structure determination for difficult homoöligomeric systems, particularly membrane associated systems hindered by the availability of only sparse and ambiguous restraints.

Cover: [ PDF, Jpeg (Lower res) ]