Thomas R. Ioerger, Thomas Holton,
Jon A. Christopher, James C. Sacchettini
Abstract
X-ray crystallography is the most widely used method for determining
the three-dimensional structures of proteins and other macromolecules.
One of the most difficult steps in crystallography is interpreting
the electron density map to build the final model. This is often
done manually by crystallographers and is very time-consuming
and error-prone. In this paper, we introduce a new automated system
called TEXTAL for interpreting electron density maps using pattern
recognition. Given a map to be modeled, TEXTAL divides the map
into small regions and then finds regions with a similar pattern
of density in a database of maps for proteins whose structures
have already been solved. When a match is found, the coordinates
of atoms in the region are inferred by analogy. The key to making
the database lookup efficient is to extract numeric features that
represent the patterns in each region and to compare feature values
using a weighted Euclidean distance metric. It is crucial that
the features be rotation-invariant, since regions with similar
patterns of density can be oriented in any arbitrary way. This
pattern-recognition approach can take advantage of data accumulated
in large crystallographic databases to effectively learn the association
between electron density and molecular structure by example.