Improving Amino-Acid Substitution Modelling
Résumé
Amino-acid replacement models form an essential basis of protein phylogenetics. They are used to compute the substitution probabilities along the branches of phylogenies, and thus the likelihood of the data. They are also essential in protein alignment. A number of replacement models and methods to estimate these models from protein alignments were proposed since the seminal work of Dayhoff et al. (1978). I'll describe the main tendencies and the advances that our group recently achieved in this field. I'll explain how we inferred a new replacement matrix, called LG (following the authors), which clearly improves the standard JTT (Jones et al. 1992) and WAG (Whelan and Goldman 2001) matrices. I'll also describe other modelling approaches based on mixtures and/or site partitioning (e.g. based on solvent exposition, in the line of Goldman et al. 1998) and show that these further improve over LG. The gain in likelihood value between these compound models and standard JTT or WAG is of the same range as the gain brought by the use a gamma distribution to account for variation of rates among sites. Moreover, maximum-likelihood tree topologies tend to be modified when these new models are used, meaning that they should become routine in phylogenetics.