An improved general amino acid replacement matrix

Quang Le Si; Olivier Gascuel

doi:10.1093/molbev/msn067

Article Dans Une Revue Molecular Biology and Evolution Année : 2008

An improved general amino acid replacement matrix

(1) , (1)

Quang Le Si

Fonction : Auteur
PersonId : 845246
ORCID : 0000-0002-3715-210X

Méthodes et Algorithmes pour la Bioinformatique

Olivier Gascuel

Fonction : Auteur correspondant
PersonId : 938491
IdHAL : olivier-gascuel
ORCID : 0000-0002-9412-9723

Connectez-vous pour contacter l'auteur

Méthodes et Algorithmes pour la Bioinformatique

Résumé

Amino acid replacement matrices are an essential basis of protein phylogenetics. They are used to compute substitution probabilities along phylogeny branches and thus the likelihood of the data. They are also essential in protein alignment. A number of replacement matrices and methods to estimate these matrices from protein alignments have been proposed since the seminal work of Dayhoff et al. (1972). An important advance was achieved by Whelan and Goldman (2001) and their WAG matrix, thanks to an efficient maximum likelihood estimation approach that accounts for the phylogenies of sequences within each training alignment. We further refine this method by incorporating the variability of evolutionary rates across sites in the matrix estimation and using a much larger and diverse database than BRKALN, which was used to estimate WAG. To estimate our new matrix (called LG after the authors), we use an adaptation of the XRATE software and 3,912 alignments from Pfam, comprising approximately 50,000 sequences and approximately 6.5 million residues overall. To evaluate the LG performance, we use an independent sample consisting of 59 alignments from TreeBase and randomly divide Pfam alignments into 3,412 training and 500 test alignments. The comparison with WAG and JTT shows a clear likelihood improvement. With TreeBase, we find that 1) the average Akaike information criterion gain per site is 0.25 and 0.42, when compared with WAG and JTT, respectively; 2) LG is significantly better than WAG for 38 alignments (among 59), and significantly worse with 2 alignments only; and 3) tree topologies inferred with LG, WAG, and JTT frequently differ, indicating that using LG impacts not only the likelihood value but also the output tree. Results with the test alignments from Pfam are analogous. LG and a PHYML implementation can be downloaded from http://atgc.lirmm.fr/LG

Mots clés

amino-acid substitutions replacement matrices JTT WAG maximum-likelihood estimations phylogenetic inference

Domaines

Bio-informatique [q-bio.QM] Bio-Informatique, Biologie Systémique [q-bio.QM]

Fichier principal

LeGascuel_MBE08.pdf (291.45 Ko)

Origine	Fichiers éditeurs autorisés sur une archive ouverte

Isabelle Gouat : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00324106

Soumis le : mercredi 5 septembre 2012-11:37:36

Dernière modification le : vendredi 24 mars 2023-14:52:56

Archivage à long terme le : jeudi 6 décembre 2012-15:55:42

Dates et versions

lirmm-00324106 , version 1 (04-03-2009)

lirmm-00324106 , version 2 (05-09-2012)

Identifiants

HAL Id : lirmm-00324106 , version 2
DOI : 10.1093/molbev/msn067

Citer

Quang Le Si, Olivier Gascuel. An improved general amino acid replacement matrix. Molecular Biology and Evolution, 2008, 25 (7), pp.1307-1320. ⟨10.1093/molbev/msn067⟩. ⟨lirmm-00324106v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS MAB LIRMM MIPS UNIV-MONTPELLIER

1056 Consultations

1801 Téléchargements

An improved general amino acid replacement matrix

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager