Distance-based Phylogeny Reconstruction (Optimal Radius): 1999, Atteson; 2005, Elias, Lagergren

Richard Desper 1 Olivier Gascuel 2, *
* Auteur correspondant
2 MAB - Méthodes et Algorithmes pour la Bioinformatique
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : A phylogeny is an evolutionary tree tracing the shared history, including common ancestors, of a set of extant taxa. Phylogenies have been historically reconstructed using character-based (parsimony) methods, but in recent years the advent of DNA sequencing, along with the development of large databases of molecular data, has led to more involved methods. Sophisticated techniques such as likelihood and Bayesian methods are used to estimate phylogenies with sound statistical justifications. However, these statistical techniques suffer from the discrete nature of tree topology space. Since the number of tree topologies increases exponentially as a function of the number of taxa, and each topology requires separate likelihood calculation, it is important to restrict the search space and to design efficient heuristics. Distance methods for phylogeny reconstruction serve this purpose by inferring trees in a fraction of the time required for the more statistically rigorous methods. They allow dealing with thousands of taxa, while the current implementations of statistical approaches are limited to a few hundreds, and distance methods also provide fairly accurate starting trees to be further refined by more sophisticated methods. Moreover, the input of distance methods is the matrix of pairwise evolutionary distances among taxa, which are estimated by maximum likelihood, so that distance methods also have sound statistical justifications. Mathematically, a phylogenetic tree is a triple T =(V,E,l) where V is the set of nodes representing extant taxa and ancestral species, E is the set of edges (branches), and l is a function that assigns positive lengths to each edge in E. Evolution proceeds through the tree structure as a stochastic process with a finite state space corresponding to the DNA bases or amino acids present in the DNA or protein sequences, respectively. Any phylogenetic tree T defines a metric DT on its leaf set L(T) : let PT (u,v) define the unique path through T from u to v, then the distance from u to v is set to DT(u,v)= ∑l(e). e∈P (u,v) Distance methods for phylogeny reconstruction rely on the observation [13] that the map T → DT is reversible; i.e., a tree T can be reconstructed from its tree metric. While in practice DT is not known, by using models of evolution (e.g. [10], reviewed in [5]) one can use molecular sequence data to estimate a distance matrix D that approximates DT . As the amount of sequence data increases, the consistency of the various models of sequence evolution implies that D should converge to DT . Thus for a distance method to be consistent, it is necessary that for any tree T, and for distance matrices D “close enough” to DT , the algorithm will output T. The present chapter deals with the question of when any distance algorithm for phylogeny reconstruction can be guaranteed to output the correct phylogeny as a function of the divergence between the metric underlying the true phylogeny and the metric estimated from the data. Atteson [1] demonstrated that this consistency can be shown for Neighbor Joining (NJ) [11], the most popular distance method, and a number of NJ’s variants.
Type de document :
Chapitre d'ouvrage
Ming-Yang Kao. Encyclopedia of Algorithms, Springer, pp.1-99, 2008, 978-0-387-30162-4. 〈10.1007/978-0-387-30162-4_115〉. 〈www.lirmm.fr/mab〉
Liste complète des métadonnées

Littérature citée [11 références]  Voir  Masquer  Télécharger

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00324131
Contributeur : Olivier Gascuel <>
Soumis le : mercredi 24 septembre 2008 - 10:31:44
Dernière modification le : jeudi 11 janvier 2018 - 06:26:12
Document(s) archivé(s) le : jeudi 3 juin 2010 - 21:58:04

Fichier

DesperGascuel_EncycAlg_2008.pd...
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Richard Desper, Olivier Gascuel. Distance-based Phylogeny Reconstruction (Optimal Radius): 1999, Atteson; 2005, Elias, Lagergren. Ming-Yang Kao. Encyclopedia of Algorithms, Springer, pp.1-99, 2008, 978-0-387-30162-4. 〈10.1007/978-0-387-30162-4_115〉. 〈www.lirmm.fr/mab〉. 〈lirmm-00324131〉

Partager

Métriques

Consultations de la notice

150

Téléchargements de fichiers

140