Phylogenetic Mixture Models for Proteins

Quang S. Le; Nicolas Lartillot; Olivier Gascuel

doi:10.1098/rstb.2008.0180

Article Dans Une Revue Philosophical Transactions of the Royal Society B: Biological Sciences Année : 2008

Phylogenetic Mixture Models for Proteins

(1) , (2) , (2)

1
2

Quang S. Le

Fonction : Auteur
PersonId : 845246
ORCID : 0000-0002-3715-210X

Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier

Nicolas Lartillot

Fonction : Auteur
PersonId : 169655
IdHAL : nicolas-lartillot
ORCID : 0000-0002-9973-7760
IdRef : 122641833

Méthodes et Algorithmes pour la Bioinformatique

Olivier Gascuel

Fonction : Auteur
PersonId : 938491
IdHAL : olivier-gascuel
ORCID : 0000-0002-9412-9723

Méthodes et Algorithmes pour la Bioinformatique

Résumé

Standard protein substitution models use a single amino-acid replacement rate matrix which summarizes the biological, chemical and physical properties of amino acids. However, site evolution is highly heterogeneous and depends on many factors: genetic code, solvent exposure, secondary and tertiary structure, protein function, etc. These impact the substitution pattern, and in most cases a single replacement matrix is not enough to represent all the complexity of the evolutionary processes. This paper explores in a maximum-likelihood framework phylogenetic mixture models, which combine several amino-acid replacement matrices to better fit protein evolution. We learn these mixture models from a large alignment database extracted from HSSP, and test the performance using independent alignments from TreeBase. We compare unsupervised learning approaches, where the site categories are unknown, to supervised ones, where in estimations we use the known category of each site, based on its exposure or its secondary structure. All our models are combined with gamma distributed rates across sites. Results show that highly significant likelihood gains are obtained when using mixture models, compared to the best available single replacement matrices. Mixtures of matrices also improve over mixtures of profiles in the manner of the CAT model. The unsupervised approach tends to be better than the supervised one, but it appears difficult to implement and highly sensitive to the starting values of the parameters, meaning that the supervised approach is still of interest for initialization and model comparison. Using an unsupervised model involving 3 matrices, the average AIC gain per site with TreeBase test alignments is 0.31, 0.49 and 0.61, compared to LG, WAG and JTT, respectively. This 3-matrix model is significantly better than LG for 34 alignments (among 57), and significantly worse for 1 alignment only. Moreover, tree topologies inferred with our mixture models frequently differ from those obtained with single matrices, indicating that using these mixtures impacts not only the likelihood value but also the output tree. All our models and a PhyML implementation are available from http://atgc.lirmm.fr/mixtures.

Mots clés

amino acid replacement matrices JTT WAG and LG CAT profile model maximum-likelihood estimations phylogenetic inference

Domaines

Bio-informatique [q-bio.QM]

Fichier principal

LeLartillotGascuel_PhiTrsRoySocB_2008.pdf (252.13 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Olivier Gascuel : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00365645

Soumis le : mercredi 4 mars 2009-08:18:06

Dernière modification le : vendredi 24 mars 2023-14:52:51

Archivage à long terme le : mardi 8 juin 2010-23:06:31

Dates et versions

lirmm-00365645 , version 1 (04-03-2009)

lirmm-00365645 , version 2 (05-09-2012)

Identifiants

HAL Id : lirmm-00365645 , version 1
DOI : 10.1098/rstb.2008.0180

Citer

Quang S. Le, Nicolas Lartillot, Olivier Gascuel. Phylogenetic Mixture Models for Proteins. Philosophical Transactions of the Royal Society B: Biological Sciences, 2008, 363, pp.3965-3976. ⟨10.1098/rstb.2008.0180⟩. ⟨lirmm-00365645v1⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

356 Consultations

820 Téléchargements

Phylogenetic Mixture Models for Proteins

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Altmetric

Partager