Skip to Main content Skip to Navigation
Conference papers

Accounting for exposition and secondary structure in protein evolution: models and gains

Quang Le Si 1 Olivier Gascuel 2, *
* Corresponding author
2 MAB - Méthodes et Algorithmes pour la Bioinformatique
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : It has been recognized for a long time that substitution processes vary depending on structural configurations. However, this information is not (or rarely) used in phylogenetic studies, even though the structure of dozen thousands of proteins has been elucidated. Here we reinvestigate the question in order to fill this gap. We used a very large dataset comprising 4,389 protein alignments with structural annotations to estimate new amino-acid substitution matrices for various structural configurations. Moreover, we used an independent sample of 500 alignments to evaluate the gain in tree likelihood brought by these new matrices. Various ways to combine these models (matrices) were envisaged, namely, separate analysis based on available annotations, mixtures (assuming no structural information), and a combination of both based on an estimated parameter that reflects the reliability of structural annotations. Our results show that separate analysis and mixtures are nearly equivalent in average, while our confidence-based approach is best thanks to its ability to detect poorly annotated proteins. Highest likelihood values are obtained with six structural categories combining exposed/buried and alpha/beta/other status of the sites; the average gain is as high as 1.16 AIC points per site, compared to standard WAG model. This six-category model is closely followed by the two-category exposed/buried model, while the secondary structure-based three-category model is worse, but still better than WAG. All these likelihood gains induce significant topological changes in the trees being inferred, indicating that our models should be used routinely by phylogeneticists.
Complete list of metadata
Contributor : Olivier Gascuel Connect in order to contact the contributor
Submitted on : Tuesday, December 11, 2007 - 6:46:31 PM
Last modification on : Friday, October 22, 2021 - 3:07:27 PM


  • HAL Id : lirmm-00195966, version 1



Quang Le Si, Olivier Gascuel. Accounting for exposition and secondary structure in protein evolution: models and gains. The Annual New Zealand Phylogenetics Meeting, Feb 2007, Mount Ruapehu, New Zealand, New Zealand. ⟨lirmm-00195966⟩



Record views