Building the Assembly De Bruijn Graph from an Implicit Suffix Tree

Eric Rivals 1, 2, *
* Auteur correspondant
2 MAB - Méthodes et Algorithmes pour la Bioinformatique
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : Suffix trees belong to the most studied indexing data structures for strings. Generalised suffix trees can index the substrings of a set of input words. In its construction algorithm Ukkonen used implicit suffix trees, which relax the assumption that a suffix is not a prefix of another suffix, and thus represent some suffixes by internal nodes rather than by leaves. In computational biology, for the sake of genome assembly, one wishes to assemble a target sequence from a multitude of input strings, usually called the reads. Numerous algorithms build assembly related graphs that represent the overlaps of the input reads: for instance, the overlap graph, or a special version of the De Bruijn graph, in which only k-mers occurring within the reads are represented by nodes. We address the question of algorithms for building the overlap graph and the De Bruijn graph directly from indexing data structures of the read set, and investigate it when the index in an implicit generalised suffix tree. We will present the algorithms for the De Bruijn graph and for its contracted version, and show they run in a time that is linear in the size of the index. Generally, this work establishes algorithmic paths to convert classical indexing data structures in graph representations useful for assembly. Work in collaboration with Bastien Cazaux et Thierry Lecroq.
Type de document :
Communication dans un congrès
Jens Stoye and Roland Wittler. Mini-Workshop on the Storage, Search and Annotation of Multiple Similar Genomes, Dec 2013, Bielefeld, Germany. 2013, 〈http://wiki.techfak.uni-bielefeld.de/didy/workshopPangenomeStorageSearch〉
Liste complète des métadonnées

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00938959
Contributeur : Eric Rivals <>
Soumis le : mercredi 29 janvier 2014 - 17:29:32
Dernière modification le : mercredi 10 octobre 2018 - 14:28:12
Document(s) archivé(s) le : dimanche 9 avril 2017 - 02:43:01

Fichier

pan-genome-Rivals-res.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : lirmm-00938959, version 1

Collections

Citation

Eric Rivals. Building the Assembly De Bruijn Graph from an Implicit Suffix Tree. Jens Stoye and Roland Wittler. Mini-Workshop on the Storage, Search and Annotation of Multiple Similar Genomes, Dec 2013, Bielefeld, Germany. 2013, 〈http://wiki.techfak.uni-bielefeld.de/didy/workshopPangenomeStorageSearch〉. 〈lirmm-00938959〉

Partager

Métriques

Consultations de la notice

814

Téléchargements de fichiers

453