Accounting for exposition and secondary structure in protein evolution: models and gains

Quang Le Si; Olivier Gascuel

Communication Dans Un Congrès Année : 2007

Accounting for exposition and secondary structure in protein evolution: models and gains

(1) , (2)

1
2

Quang Le Si

Fonction : Auteur
PersonId : 845246
ORCID : 0000-0002-3715-210X

Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier

Olivier Gascuel

Fonction : Auteur correspondant
PersonId : 938491
IdHAL : olivier-gascuel
ORCID : 0000-0002-9412-9723

Connectez-vous pour contacter l'auteur

Méthodes et Algorithmes pour la Bioinformatique

Résumé

It has been recognized for a long time that substitution processes vary depending on structural configurations. However, this information is not (or rarely) used in phylogenetic studies, even though the structure of dozen thousands of proteins has been elucidated. Here we reinvestigate the question in order to fill this gap. We used a very large dataset comprising 4,389 protein alignments with structural annotations to estimate new amino-acid substitution matrices for various structural configurations. Moreover, we used an independent sample of 500 alignments to evaluate the gain in tree likelihood brought by these new matrices. Various ways to combine these models (matrices) were envisaged, namely, separate analysis based on available annotations, mixtures (assuming no structural information), and a combination of both based on an estimated parameter that reflects the reliability of structural annotations. Our results show that separate analysis and mixtures are nearly equivalent in average, while our confidence-based approach is best thanks to its ability to detect poorly annotated proteins. Highest likelihood values are obtained with six structural categories combining exposed/buried and alpha/beta/other status of the sites; the average gain is as high as 1.16 AIC points per site, compared to standard WAG model. This six-category model is closely followed by the two-category exposed/buried model, while the secondary structure-based three-category model is worse, but still better than WAG. All these likelihood gains induce significant topological changes in the trees being inferred, indicating that our models should be used routinely by phylogeneticists.

Domaines

Bio-informatique [q-bio.QM] Evolution [q-bio.PE]

Olivier Gascuel : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00195966

Soumis le : mardi 11 décembre 2007-18:46:31

Dernière modification le : vendredi 24 mars 2023-14:52:49

Dates et versions

lirmm-00195966 , version 1 (11-12-2007)

Identifiants

HAL Id : lirmm-00195966 , version 1

Citer

Quang Le Si, Olivier Gascuel. Accounting for exposition and secondary structure in protein evolution: models and gains. The Annual New Zealand Phylogenetics Meeting, Feb 2007, Mount Ruapehu, New Zealand, New Zealand. ⟨lirmm-00195966⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS MAB LIRMM MIPS UNIV-MONTPELLIER

59 Consultations

0 Téléchargements

Accounting for exposition and secondary structure in protein evolution: models and gains

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager