Distance-based Phylogeny Reconstruction (Optimal Radius)

Richard Desper; Olivier Gascuel

doi:10.1007/978-0-387-30162-4_115

Chapitre D'ouvrage Année : 2008

Distance-based Phylogeny Reconstruction (Optimal Radius)

(1) , (2)

1
2

Richard Desper

Fonction : Auteur

Department of Biology

Olivier Gascuel

Fonction : Auteur correspondant
PersonId : 938491
IdHAL : olivier-gascuel
ORCID : 0000-0002-9412-9723

Connectez-vous pour contacter l'auteur

Méthodes et Algorithmes pour la Bioinformatique

Résumé

A phylogeny is an evolutionary tree tracing the shared history, including common ancestors, of a set of extant taxa. Phylogenies have been historically reconstructed using character-based (parsimony) methods, but in recent years the advent of DNA sequencing, along with the development of large databases of molecular data, has led to more involved methods. Sophisticated techniques such as likelihood and Bayesian methods are used to estimate phylogenies with sound statistical justifications. However, these statistical techniques suffer from the discrete nature of tree topology space. Since the number of tree topologies increases exponentially as a function of the number of taxa, and each topology requires separate likelihood calculation, it is important to restrict the search space and to design efficient heuristics. Distance methods for phylogeny reconstruction serve this purpose by inferring trees in a fraction of the time required for the more statistically rigorous methods. They allow dealing with thousands of taxa, while the current implementations of statistical approaches are limited to a few hundreds, and distance methods also provide fairly accurate starting trees to be further refined by more sophisticated methods. Moreover, the input of distance methods is the matrix of pairwise evolutionary distances among taxa, which are estimated by maximum likelihood, so that distance methods also have sound statistical justifications. Mathematically, a phylogenetic tree is a triple T =(V,E,l) where V is the set of nodes representing extant taxa and ancestral species, E is the set of edges (branches), and l is a function that assigns positive lengths to each edge in E. Evolution proceeds through the tree structure as a stochastic process with a finite state space corresponding to the DNA bases or amino acids present in the DNA or protein sequences, respectively. Any phylogenetic tree T defines a metric DT on its leaf set L(T) : let PT (u,v) define the unique path through T from u to v, then the distance from u to v is set to DT(u,v)= ∑l(e). e∈P (u,v) Distance methods for phylogeny reconstruction rely on the observation [13] that the map T → DT is reversible; i.e., a tree T can be reconstructed from its tree metric. While in practice DT is not known, by using models of evolution (e.g. [10], reviewed in [5]) one can use molecular sequence data to estimate a distance matrix D that approximates DT . As the amount of sequence data increases, the consistency of the various models of sequence evolution implies that D should converge to DT . Thus for a distance method to be consistent, it is necessary that for any tree T, and for distance matrices D “close enough” to DT , the algorithm will output T. The present chapter deals with the question of when any distance algorithm for phylogeny reconstruction can be guaranteed to output the correct phylogeny as a function of the divergence between the metric underlying the true phylogeny and the metric estimated from the data. Atteson [1] demonstrated that this consistency can be shown for Neighbor Joining (NJ) [11], the most popular distance method, and a number of NJ’s variants.

Mots clés

phylogeny reconstruction distance methods performance analysis robustness safety radius approach optimal radius

Domaines

Bio-informatique [q-bio.QM] Bio-Informatique, Biologie Systémique [q-bio.QM]

Fichier principal

DesperGascuel_EncycAlg_2008.pdf (46.76 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Olivier Gascuel : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00324131

Soumis le : mercredi 24 septembre 2008-10:31:44

Dernière modification le : vendredi 24 mars 2023-14:52:51

Archivage à long terme le : jeudi 3 juin 2010-21:58:04

Dates et versions

lirmm-00324131 , version 1 (24-09-2008)

Identifiants

HAL Id : lirmm-00324131 , version 1
DOI : 10.1007/978-0-387-30162-4_115

Citer

Richard Desper, Olivier Gascuel. Distance-based Phylogeny Reconstruction (Optimal Radius): 1999, Atteson; 2005, Elias, Lagergren. Ming-Yang Kao. Encyclopedia of Algorithms, Springer, pp.1-99, 2008, 978-0-387-30162-4. ⟨10.1007/978-0-387-30162-4_115⟩. ⟨lirmm-00324131⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS MAB LIRMM MIPS UNIV-MONTPELLIER

129 Consultations

262 Téléchargements

Distance-based Phylogeny Reconstruction (Optimal Radius)

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager