The Combinatorics of Tandem Duplication Trees

Olivier Gascuel 1 Michael Hendy 2, 3 Alain Jean-Marie 4 Robert Mclachlan 3
1 MAB - Méthodes et Algorithmes pour la Bioinformatique
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
4 APR - Algorithmes et Performance des Réseaux
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : We developed a recurrence relation that counts the number of tandem duplication trees (either rooted or unrooted) that are consistent with a set of n tandemly repeated sequences generated under the standard unequal recombination (or crossover) model of tandem duplications. The number of rooted duplication trees is exactly twice the number of unrooted trees, which means that on average only two positions for a root on a duplication tree are possible. Using the recurrence, we tabulated these numbers for small values of n. We also developed an asymptotic formula that for large n provides estimates for these numbers. These numbers give a priori probabilities for phylogenies of the repeated sequences to be duplication trees. This work extends earlier studies where exhaustive counts of the numbers for small n were obtained. One application showed the significance of finding that most maximum-parsimony trees constructed from repeat sequences from human immunoglobins and T-cell receptors were tandem duplication trees. Those findings provided strong support to the proposed mechanisms of tandem gene duplication. The recurrence relation also suggests efficient algorithms to recognize duplication trees and to generate random duplication trees for simulation.We present a linear-time recognition algorithm. [Asymptotic enumeration; random generation; recognition; recursion; tandem duplication trees.]
Type de document :
Article dans une revue
Systematic Biology, Oxford University Press (OUP), 2003, 52, pp.110-118. 〈10.1080/10635150390132821〉
Liste complète des métadonnées

Littérature citée [6 références]  Voir  Masquer  Télécharger

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00192006
Contributeur : Christine Carvalho de Matos <>
Soumis le : lundi 26 novembre 2007 - 13:44:46
Dernière modification le : jeudi 11 janvier 2018 - 06:26:13
Document(s) archivé(s) le : lundi 12 avril 2010 - 05:09:00

Fichier

D109.PDF
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Olivier Gascuel, Michael Hendy, Alain Jean-Marie, Robert Mclachlan. The Combinatorics of Tandem Duplication Trees. Systematic Biology, Oxford University Press (OUP), 2003, 52, pp.110-118. 〈10.1080/10635150390132821〉. 〈lirmm-00192006〉

Partager

Métriques

Consultations de la notice

146

Téléchargements de fichiers

133