Improved Sensitivity And Reliability Of Anchor Based Genome Alignment

Whole genome alignment is a challenging problem in computational comparative genomics. It is essential for the functional annotation of genomes, the understanding of their evolution, and for phylogenomics. Many global alignment programs are heuristic variations on the anchor based strategy, which relies on the initial detection of similarities and their selection in an ordered chain. Considering that alignment tools fail to align some pairs of bacterial strains, we investigate whether this is intrinsically due to the strategy or to a lack of sensitivity of the similarity detection method. For this, we implement and compare 6 programs based on three different detection methods (from exact matches to local alignments) on a large benchmark set. Our results suggest that the sensitivity of well known methods, like MGA or Mauve, can be greatly improved in the case of divergent genomes if one exploits spaced seeds at the detection phase. In other cases, such methods yield alignments that cover nearly the whole genome. Then, we focus on global reliability of alignments: should an aligned pair of segments be included in the global genome alignment? We investigate this reliability according to both the segment "alignability" and to inclusion of orthologs. Again, we provide evidence that for both close and divergent genomes, one of our programs, YH, achieves alignments with sometimes a lower coverage, but a higher inclusion of orthologs. It opens the way to the first reliable alignments for some highly divergent species like Buchnera aphidicola or Prochlorococcus marinus.

Mots clés

Spaced seeds Anchor based strategy Global genome alignment

Domaines

Bio-informatique [q-bio.QM]

Fichier principal

jobim26.pdf (94.62 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Raluca Uricaru : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00407215

Soumis le : vendredi 24 juillet 2009-11:10:16

Dernière modification le : mardi 9 juillet 2024-10:34:00

Archivage à long terme le : mardi 15 juin 2010-19:26:34

Dates et versions

lirmm-00407215 , version 1 (24-07-2009)

Identifiants

HAL Id : lirmm-00407215 , version 1

Citer

Raluca Uricaru, Célia Michotey, Laurent Noé, Hélène Chiapello, Eric Rivals. Improved Sensitivity And Reliability Of Anchor Based Genome Alignment. JOBIM 2009 - 10es Journées Ouvertes en Biologie, Informatique et Mathématiques, Jun 2009, Nantes, France. pp.31-36. ⟨lirmm-00407215⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-LILLE3 CNRS INRIA INRA LIFL MAB LIRMM CRISTAL INRIA2 CRISTAL-BONSAI MIPS UNIV-MONTPELLIER INRAE MATHNUM DPT_ECODIV

484 Consultations

246 Téléchargements