CAT : Un Modèle Phylogénétique Bayésien permettant de prendre en compte l'Hétérogénéité des Processus de Substitution entre Sites dans les Alignements Protéiques
Résumé
We propose a Bayesian mixture model, accounting for across-site heterogeneities of the substi- tutional processes in protein sequences. Our model, CAT, is based on the formalism of the Dirichlet processes, in which the total number of classes of the underlying mixture is not specified a priori, but rather, is considered an unknown of the problem, and is directly inferred from the available data. In this paper, we describe the model, and show its connections with the Bayesian non-parametric approach for modeling hetero- geneity. We apply it to a series of alignments of real proteins, and uncover a significant level of heterogene- ity across sites. Finally, by the evaluation of the Bayes factor, we show that the CAT model yields a signifi- cant improvement of the statistical fit over the standard models, based on one single substitution process describing all the sites of the alignment.
Domaines
Autre [cs.OH]
Loading...