Fast NJ-Like Algorithms to Deal with Incomplete Distance Matrices

Alexis Criscuolo; Olivier Gascuel

doi:10.1186/1471-2105-9-166

Article Dans Une Revue BMC Bioinformatics Année : 2008

Fast NJ-Like Algorithms to Deal with Incomplete Distance Matrices

(1, 2, 3) , (4)

1
2
3
4

Alexis Criscuolo

Fonction : Auteur
PersonId : 184478
IdHAL : alexis-criscuolo
ORCID : 0000-0002-8212-5215

Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier

Institut des Sciences de l'Evolution de Montpellier

Laboratoire de Sciences de l'Image, de l'Informatique et de la Télédétection, équipe ICPS

Olivier Gascuel

Fonction : Auteur correspondant
PersonId : 938491
IdHAL : olivier-gascuel
ORCID : 0000-0002-9412-9723

Connectez-vous pour contacter l'auteur

Méthodes et Algorithmes pour la Bioinformatique

Résumé

BACKGROUND: Distance-based phylogeny inference methods first estimate evolutionary distances between every pair of taxa, then build a tree from the so-obtained distance matrix. These methods are fast and fairly accurate. However, they hardly deal with incomplete distance matrices. Such matrices are frequent with recent multi-gene studies, when two species do not share any gene in analyzed data. The few existing algorithms to infer trees with satisfying accuracy from incomplete distance matrices have time complexity in O(n4) or more, where n is the number of taxa, which precludes large scale studies. Agglomerative distance algorithms (e.g. NJ 12) are much faster, with time complexity in O(n3) which allows huge datasets and heavy bootstrap analyses to be dealt with. These algorithms proceed in three steps: (a) search for the taxon pair to be agglomerated, (b) estimate the lengths of the two so-created branches, (c) reduce the distance matrix and return to (a) until the tree is fully resolved. But available agglomerative algorithms cannot deal with incomplete matrices. RESULTS: We propose an adaptation to incomplete matrices of three agglomerative algorithms, namely NJ, BIONJ 3 and MVR 4. Our adaptation generalizes to incomplete matrices the taxon pair selection criterion of NJ (also used by BIONJ and MVR), and combines this generalized criterion with that of ADDTREE 5. Steps (b) and (c) are also modified, but O(n3) time complexity is kept. The performance of these new algorithms is studied with large scale simulations, which mimic multi-gene phylogenomic datasets. Our new algorithms - named NJ*, BIONJ* and MVR* - infer phylogenetic trees that are as least as accurate as those inferred by other available methods, but with much faster running times. MVR* presents the best overall performance. This algorithm accounts for the variance of the pairwise evolutionary distance estimates, and is well suited for multi-gene studies where some distances are accurately estimated using numerous genes, whereas others are poorly estimated (or not estimated) due to the low number (absence) of sequenced genes being shared by both species. CONCLUSION: Our distance-based agglomerative algorithms NJ*, BIONJ* and MVR* are fast and accurate, and should be quite useful for large scale phylogenomic studies. When combined with the SDM method 6 to estimate a distance matrix from multiple genes, they offer a relevant alternative to usual supertree techniques 7. Binaries and all simulated data are downloadable from 8.

Domaines

Bio-informatique [q-bio.QM] Bio-Informatique, Biologie Systémique [q-bio.QM]

Fichier principal

CriscuoloGascuel_BMCBioinfo2008.pdf (386.96 Ko)

Origine	Fichiers éditeurs autorisés sur une archive ouverte

Isabelle Gouat : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00324110

Soumis le : mercredi 5 septembre 2012-14:56:33

Dernière modification le : mardi 12 novembre 2024-11:22:03

Dates et versions

lirmm-00324110 , version 1 (24-09-2008)

lirmm-00324110 , version 2 (05-09-2012)

Identifiants

HAL Id : lirmm-00324110 , version 2
DOI : 10.1186/1471-2105-9-166

Citer

Alexis Criscuolo, Olivier Gascuel. Fast NJ-Like Algorithms to Deal with Incomplete Distance Matrices. BMC Bioinformatics, 2008, 9 (166), pp.1471-2105. ⟨10.1186/1471-2105-9-166⟩. ⟨lirmm-00324110v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

IRD CIRAD EPHE CNRS ISEM MAB LIRMM AGROPOLIS PSL MIPS B3ESTE UNIV-MONTPELLIER SITE-ALSACE

321 Consultations

370 Téléchargements

Fast NJ-Like Algorithms to Deal with Incomplete Distance Matrices

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager