Reconstructing evolutionary trees from distances
Abstract
Several popular methods for inferring evolutionary trees (or for hierarchical clustering) are based on a matrix of pairwise distances between species (or any kind of objects): the objective is to construct a tree with edge lengths so that the pairwise distances between the leaves in that tree are as close as possible to the input distances. In evolutionary biology, each of these distances is typically an estimate of the amount of change separating two species and is estimated from molecular sequences using probabilistic models of sequence evolution. The fundamental step of distance-based tree reconstruction is to fit the edge lengths of a tree of fixed structure to the given distance estimates. This step implicitly depends on the variances assumed for these estimates. In this talk, I will discuss a number of tree reconstruction methods, showing my work on them and showing in particular how their properties (such as their robustness to noisy data) are affected by the variance model they assume. I am currently investigating variance assumptions leading to objective functions that can be optimized very rapidly. This has the potential to lead to very fast and accurate tree reconstruction algorithms.