Version française

Thursday March 10 2005

TULIP: a theoretical model for protein sequences, unifying several domains in bioinformatics and evolutionary biology

BMC Bioinformatics (2005) 6:49 Bioinformatics (2004) 20:534-537
Biologists and biomathematicians at the Department of Plant Cell Physiology (DRDC/PCV, CEA Grenoble), jointly with the Gene-IT company, the Department of Biology, Information Technology and Mathematics (DRDC/BIM, CEA Grenoble) and the Department of Plant Ecophysiology and Microbiology (DEVM, CEA Cadarache) have developed a unifying theoretical model for the analysis of protein sequences, TULIP. This method makes it possible (i) to resolve incongruent molecular phylogenies, and (ii) to construct the evolution linking the sequences of an entire data base without disturbing the general topology of the base every time it is improved.


The protein sequences were considered as objects positioned in a configuration space inspired by particle physics. The geometry of this space respects the probabilistic framework of the TULIP theorem (Theorem of the Upper LImit of a score Probability) demonstrated at the PCV laboratory. The method is information-conserving, in the same way that physical theories are energy-conserving. A topology is assigned to this space, making it possible to deduce the probable pathways connecting two related sequences, in accordance with Darwin's theory of evolution.

It is thus possible to deduce from this topology evolutionary tree diagrams called TULIP trees linking a group of homologous sequences.

 

The phylogenetic tree constructed by the TULIP method (B) resolves the incongruence observed in the evolution of enolases reconstituted by the classical method (A).

 

The TULIP trees are therefore not constructed, as was the case hitherto, from multiple sequence alignments and reshaping of the general topology of the tree after adding or removing proteins in the multiple alignment. The TULIP trees conserve all the mutual information between each sequence and are based on a topology between sequences that is unaffected by addition or subtraction. There are many applications of this theory. First, the method allows the resolution of incongruent molecular phylogenies. A website is being developed to enable the scientific community to use this method to construct TULIP phylogenetic trees from sequence sets.  It is also possible to construct the evolution linking the sequences of a whole database without having to restore the general topology every time the base is updated.