Linking indexing data structures to de Bruijn graphs: Construction and update

Abstract : DNA sequencing technologies have tremendously increased their throughput, and hence complicated DNA assembly. Numerous assembly programs use de Bruijn graphs (dBG) built from short reads to merge these into contigs, which represent putative DNA segments. In a dBG of order k, nodes are substrings of length k of reads (or k-mers), while arcs are their k + 1-mers. As analysing reads often require to index all their substrings, it is interesting to exhibit algorithms that directly build a dBG from a pre-existing index, and especially a contracted dBG, where non-branching paths are condensed into single nodes. Here, we exhibit linear time algorithms for constructing the full or contracted dBGs from suffix trees, suffix arrays, and truncated suffix trees. With the latter the construction uses a space that is linear in the size of the dBG. Finally, we also provide algorithms to dynamically update the order of the graph without reconstructing it.
Type de document :
Article dans une revue
Journal of Computer and System Sciences, Elsevier, 2016, in press. 〈10.1016/j.jcss.2016.06.008〉
Liste complète des métadonnées

Littérature citée [25 références]  Voir  Masquer  Télécharger

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01617207
Contributeur : Eric Rivals <>
Soumis le : lundi 16 octobre 2017 - 13:31:08
Dernière modification le : mercredi 18 avril 2018 - 11:05:41
Document(s) archivé(s) le : mercredi 17 janvier 2018 - 12:52:18

Fichier

Cazaux-etal-JCSS-16-temp.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Bastien Cazaux, Thierry Lecroq, Eric Rivals. Linking indexing data structures to de Bruijn graphs: Construction and update. Journal of Computer and System Sciences, Elsevier, 2016, in press. 〈10.1016/j.jcss.2016.06.008〉. 〈lirmm-01617207〉

Partager

Métriques

Consultations de la notice

92

Téléchargements de fichiers

50