TULIP: a theoretical model for protein sequences, unifying several domains in bioinformatics and evolutionary biology
The protein sequences were considered as objects positioned in a configuration space inspired by particle physics. The geometry of this space respects the probabilistic framework of the TULIP theorem (Theorem of the Upper LImit of a score Probability) demonstrated at the PCV laboratory. The method is information-conserving, in the same way that physical theories are energy-conserving. A topology is assigned to this space, making it possible to deduce the probable pathways connecting two related sequences, in accordance with Darwin's theory of evolution.
|
The phylogenetic tree constructed by the TULIP method (B) resolves the incongruence observed in the evolution of enolases reconstituted by the classical method (A). |
The TULIP trees are therefore not constructed, as was the case hitherto, from multiple sequence alignments and reshaping of the general topology of the tree after adding or removing proteins in the multiple alignment. The TULIP trees conserve all the mutual information between each sequence and are based on a topology between sequences that is unaffected by addition or subtraction. There are many applications of this theory. First, the method allows the resolution of incongruent molecular phylogenies. A website is being developed to enable the scientific community to use this method to construct TULIP phylogenetic trees from sequence sets. It is also possible to construct the evolution linking the sequences of a whole database without having to restore the general topology every time the base is updated.
