Distance-based methods in phylogenomics

Abstract : With the continuous growth of genome sequencing capabilities, phylogenetic tree inference is increasingly based on large collections of genomic alignments -- each alignment derived from the comparison of orthologous genomic regions across different species. We have entered the era of ‘phylogenomics’, the study of evolution at the genomic scale. Classical methods in phylogenomics are commonly classified in two categories: super-tree and super-alignment methods. Super-tree approaches infer a phylogenetic tree per input alignment and then try and combine the trees obtained into a larger phylogeny, while super-alignment methods first concatenate the input alignments, and then infer a tree from the resulting super-alignment, using standard methodology. Super-tree methods are usually considered to be more efficient, while super-alignment methods are more accurate. In this talk I will introduce a third category -- distance-based phylogenomic methods -- where tree inference is based on a collection of distance matrices, one matrix of pairwise distances estimated per input alignment. This approach inherits the advantage in efficiency of supertree methods, without one of their main pitfalls: that of only focusing on the topology of the source trees, thus ignoring branch lengths (both in input and output). As an example of this approach, I will present a novel phylogenomic distance-based method, named ERaBLE (Evolutionary Rates and Branch Length Estimation), to estimate the branch lengths of a given reference topology, and the relative evolutionary rates of the genes employed in the analysis. Our experiments show that ERaBLE is very fast and fairly accurate, e.g., allowing to deal with the OrthoMaM database, composed of 6, 953 exons from up to 40 mammals.
