Building species trees from larger parts of phylogenomic databases

Celine Scornavacca 1, * Vincent Berry 2 Vincent Ranwez 3
* Auteur correspondant
2 MAB - Méthodes et Algorithmes pour la Bioinformatique
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : Gene trees are leaf-labeled trees inferred from molecular se- quences. Because of gene duplication events arising in genomes, some species host several copies of the same gene, hence individual gene trees usually have several leaves labeled with identical species names. Dealing with such multi-labeled gene trees (MUL trees) is a substantial prob- lem in phylogenomics, e.g., current supertree methods do not handle MUL trees which restricts studies for building the Tree of Life to a very small core of mono-copy genes. We propose here to tackle this problem by mainly transforming a collection of MUL trees into a collection of trees, each containing single copies of labels. To that aim, we provide several fast algorithmic building stones and describe how they fit within a general framework to build a species tree. First of all, we propose to separately preprocess each MUL tree in order to remove its redundant parts with respect to speciation events. For this purpose, we present a tree isomorphism algorithm for MUL trees to reduce redundant parts of these trees. Then, we show how the speciation signal contained in a MUL tree can be represented by a linear set of triplets. When this set is topologically coherent (compatible), we show that it can be used to pro- duce a single-copy gene tree to replace the MUL tree while preserving the information it contains on speciation events. Alternatively, we propose to extract from each MUL tree a maximum sized subtree that is free of duplication events. The algorithms are then applied in a supertree anal- ysis of hogenom , a database of homologous genes from fully sequenced genomes.
Type de document :
Article dans une revue
Information and Computation, Elsevier, 2011, 209 (3), pp.590-605. 〈http://www.sciencedirect.com/science/article/pii/S0890540110002087〉. 〈10.1016/j.ic.2010.11.022〉
Liste complète des métadonnées

Littérature citée [25 références]  Voir  Masquer  Télécharger

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00825050
Contributeur : Vincent Berry <>
Soumis le : mercredi 22 mai 2013 - 18:11:52
Dernière modification le : jeudi 24 mai 2018 - 15:59:22
Document(s) archivé(s) le : vendredi 23 août 2013 - 04:15:32

Fichier

article_Duplications_extended....
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Celine Scornavacca, Vincent Berry, Vincent Ranwez. Building species trees from larger parts of phylogenomic databases. Information and Computation, Elsevier, 2011, 209 (3), pp.590-605. 〈http://www.sciencedirect.com/science/article/pii/S0890540110002087〉. 〈10.1016/j.ic.2010.11.022〉. 〈lirmm-00825050〉

Partager

Métriques

Consultations de la notice

160

Téléchargements de fichiers

123