Accounting for Solvent Accessibility and Secondary Structure in Protein Phylogenetics is Clearly Beneficial

Quang Le Si 1 Olivier Gascuel 2, *
* Auteur correspondant
2 MAB - Méthodes et Algorithmes pour la Bioinformatique
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : Amino-acid substitution models are essential to most methods to infer phylogenies from protein data. These models represent the ways in which proteins evolve and substitutions accumulate along the course of time. It is widely accepted that the substitution processes vary depending on the structural configuration of the protein residues. However, this information is very rarely used in phylogenetic studies, though the three-dimensional structure of dozens of thousands of proteins has been elucidated. Here we reinvestigate the question in order to fill this gap. We use an improved estimation methodology and a very large database comprising 1,471 non-redundant globular protein alignments with structural annotations to estimate new amino-acid substitution models accounting for the secondary structure and solvent accessibility of the residues. These models incorporate a confidence coefficient which is estimated from the data and reflects the reliability and usefulness of structural annotations in the analyzed sequences. Our results with 300 independent test alignments show an impressive likelihood gain, compared to standard models such as JTT or WAG. Moreover, the use of these models induces significant topological changes in the inferred trees, which should be of primary interest to phylogeneticists. Our data, models and software are available for download from http://www.atgc-montpellier.fr/phyml-structure/.
Liste complète des métadonnées

Littérature citée [42 références]  Voir  Masquer  Télécharger

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00511776
Contributeur : Isabelle Gouat <>
Soumis le : mercredi 5 septembre 2012 - 10:58:39
Dernière modification le : jeudi 24 mai 2018 - 15:59:22
Document(s) archivé(s) le : jeudi 6 décembre 2012 - 04:05:11

Fichier

LeGascuel_SystBiol2010.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

Collections

Citation

Quang Le Si, Olivier Gascuel. Accounting for Solvent Accessibility and Secondary Structure in Protein Phylogenetics is Clearly Beneficial. Systematic Biology, Oxford University Press (OUP), 2010, 59 (3), pp.277-287. 〈www.lirmm.fr〉. 〈10.1093/sysbio/syq002〉. 〈lirmm-00511776v2〉

Partager

Métriques

Consultations de la notice

296

Téléchargements de fichiers

229