Accounting for exposition and secondary structure in protein evolution: models and gains

Quang Le Si 1 Olivier Gascuel 2, *
* Auteur correspondant
2 MAB - Méthodes et Algorithmes pour la Bioinformatique
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : It has been recognized for a long time that substitution processes vary depending on structural configurations. However, this information is not (or rarely) used in phylogenetic studies, even though the structure of dozen thousands of proteins has been elucidated. Here we reinvestigate the question in order to fill this gap. We used a very large dataset comprising 4,389 protein alignments with structural annotations to estimate new amino-acid substitution matrices for various structural configurations. Moreover, we used an independent sample of 500 alignments to evaluate the gain in tree likelihood brought by these new matrices. Various ways to combine these models (matrices) were envisaged, namely, separate analysis based on available annotations, mixtures (assuming no structural information), and a combination of both based on an estimated parameter that reflects the reliability of structural annotations. Our results show that separate analysis and mixtures are nearly equivalent in average, while our confidence-based approach is best thanks to its ability to detect poorly annotated proteins. Highest likelihood values are obtained with six structural categories combining exposed/buried and alpha/beta/other status of the sites; the average gain is as high as 1.16 AIC points per site, compared to standard WAG model. This six-category model is closely followed by the two-category exposed/buried model, while the secondary structure-based three-category model is worse, but still better than WAG. All these likelihood gains induce significant topological changes in the trees being inferred, indicating that our models should be used routinely by phylogeneticists.
Type de document :
Communication dans un congrès
The Annual New Zealand Phylogenetics Meeting, Feb 2007, Mount Ruapehu, New Zealand, New Zealand. 2007, 〈http://www.math.canterbury.ac.nz/bio/doom07/〉
Liste complète des métadonnées

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00195966
Contributeur : Olivier Gascuel <>
Soumis le : mardi 11 décembre 2007 - 18:46:31
Dernière modification le : jeudi 24 mai 2018 - 15:59:22

Identifiants

  • HAL Id : lirmm-00195966, version 1

Collections

Citation

Quang Le Si, Olivier Gascuel. Accounting for exposition and secondary structure in protein evolution: models and gains. The Annual New Zealand Phylogenetics Meeting, Feb 2007, Mount Ruapehu, New Zealand, New Zealand. 2007, 〈http://www.math.canterbury.ac.nz/bio/doom07/〉. 〈lirmm-00195966〉

Partager

Métriques

Consultations de la notice

48