Tessellated Molecules: Computational Geometry of Chemical and Biological Structure

Iosif I. Vaisman, Alexander Tropsha, and Weifan Zheng

School of Pharmacy
University of North Carolina at Chapel Hill
Chapel Hill, NC 27599
vaisman@gibbs.oit.unc.edu

Abstract

Methods of computational geometry provide a robust and effective approach to studying topology and architecture of molecular systems (including biological macromolecules and molecular assemblies). In this approach a molecule or molecular system is represented by the set of points in three-dimensional space, where each point designate an atom or a site (group of atoms). The Delaunay tessellation of such a set of points generates an aggregate of space-filling irregular tetrahedra, or Delaunay simplices. The vertices of each simplex define objectively four nearest neighbor atoms and the collection of all simplices describes the topology of a molecular system. Results of statistical analysis of geometrical and compositional properties of the Delaunay simplices are used to characterize structure and connectivity of the molecular system and to correlate chemical composition with the three-dimensional molecular architecture. Application of computational geometry methods to liquid water and proteins is discussed.

Distribution of tetrahedrality of the Delaunay simplices is used as a descriptor of structural order in pure water and in aqueous solutions. Distribution of tetrahedrality in water is a monotonically decreasing function of temperature. It is sensitive to the changes in thermodynamic parameters and can be precisely measured in the separate areas of a model system, e.g. near various functional groups of the solute molecules. The Delaunay tessellation naturally reflects the local tetrahedral arrangement of water molecules and provide a reliable quantitative measure of structural order.

In protein structure analysis the Delaunay tessellation facilitates the objective identification of neighboring residues for a quantitative description of nonlocal contacts in three-dimensional protein structures. Analysis of the patterns of spatial proximity of residues in known protein structures based on the Delaunay tessellation reveals highly nonrandom clustering of amino acids. Relative abundance or deficiency of residue quadruplets with certain compositions reflects propensities of different residue types to be associated or disassociated in folded proteins. The likelihood of occurrence of four residues in one simplex displays strong nonrandom signal in case of a reduced amino acid alphabet as well. We used several different reduced alphabets based on the residue chemical properties and on the complementarity of the corresponding codons. In all cases the clustering of residues correlates with their properties or genetic origin. The results of this analysis are being implemented in algorithms for protein structure classification and prediction.