Flexible Description and Optimization of Biological Constraints for the Discovery of Sequence-Structure Relationships of Proteins

(Flexible Beschreibung und Optimierung biologischer Randbedingungen zur Aufdeckung von Sequenz-Struktur-Beziehungen von Proteinen)

 

DFG Project within the DFG Research Program

Informatics Methods for the Analysis and Interpretation of Large Genomic Data Sets

(DFG-Projekt im DFG-Schwerpunktprogramm Informatikmethoden zur Analyse und Interpretation großer genomischer Datenmengen)

Objective

Development of a system for protein structure prediction that utilizes biological knowledge formulated in a description language to optimize the alignment computation


People and Institutes
Fraunhofer Institute
Algorithms and Scientific Computing
Schloss Birlinghoven
D-53754 Sankt Augustin

Mario Albrecht
Dr. Jan Freudenberg
Daniel Hanisch
Prof. Dr. Ralf Zimmer
Prof. Dr. Thomas Lengauer, PhD


Time Scale

Start of project: October 1, 1999
Duration of first project period: 2 years
Expected total project length: 4 years


Project Description

The project addresses the main problems involved with protein structure prediction, the design of efficient algorithms and of good scoring functions.

These issues are tackled by the development of a system that allows for the explicit formulation of biological knowledge in a description language. This additional knowledge is then used to control efficient optimization procedures during the alignment and score computation from a high level.

The description language enables users with the help of editors and automatic tools to specify biological facts and constraints for the computation in the ProML language, which is based on the XML standard. Thus data from mass spectrometry and NMR experiments such as cross-link distances and NOE restraints can also be included into the alignment computation.

The structure prediction algorithm is based on the threading approach and the recursive dynamic programming method, implemented in the Java programming language. To support the knowledge elucidation, we develop new methods to perform different clusterings of protein sequences and structures in order to discover additional properties and sequence-structure relationsships.


Publications

Albrecht, M.; Hanisch, D.; Zimmer, R.; Lengauer, T. 2002.
Improving fold recognition of protein threading by experimental distance constraints.
In Silico Biology 2(3):325-337.

Albrecht, M.; Hanisch, D.; Zimmer, R.; Lengauer, T. 2001.
Improving fold recognition of protein threading by experimental distance constraints.
In Proceedings of the German Conference on Bioinformatics GCB 2001, Braunschweig, pp. 68-77.

Hoffmann, D.; Schnaible, V.; Wefing, S.; Albrecht, M.; Hanisch, D.; Zimmer, R. 2002.
A new method for the fast solution of protein-3D-structures, combining experiments and bioinformatics.
In Coupling of biological and electronic systems: Proceedings of the 2nd caesarium, Bonn, November 1-3, 2000. Springer-Verlag, Berlin, pp. 59-78.

Freudenberg, J.; Zimmer, R.; Hanisch, D.; Lengauer, T. 2002.
A hypergraph-based method for unification of existing protein structure- and sequence-families.
In Silico Biology 2(3):339-349.

Freudenberg, J.; Zimmer, R.; Hanisch, D.; Lengauer, T. 2001.
A new method for unification of existing protein structure- and sequence-families.
In Proceedings of the German Conference on Bioinformatics GCB 2001, Braunschweig, pp. 78-84.

Hanisch, D.; Zimmer, R.; Lengauer, T. 2002.
ProML - the Protein Markup Language for specification of protein sequences, structures and families.
In Silico Biology 2(3):313-324.

Hanisch, D.; Zimmer, R.; Lengauer, T. 2001.
ProML - the Protein Markup Language for specification of protein sequences, structures and families.
In Proceedings of the German Conference on Bioinformatics GCB 2001, Braunschweig, pp. 58-67.