# From Indexing Data Structures to de Bruijn Graphs

* Auteur correspondant
1 MAB - Méthodes et Algorithmes pour la Bioinformatique
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : New technologies have tremendously increased sequencing throughput compared to traditional techniques, thereby complicating DNA assembly. Hence, assembly programs resort to de Bruijn graphs (dBG) of $k$-mers of short reads to compute a set of long contigs, each being a putative segment of the sequenced molecule. Other types of DNA sequence analysis, as well as preprocessing of the reads for assembly, use classical data structures to index all substrings of the reads. It is thus interesting to exhibit algorithms that directly build a de Bruijn graph of order $k$ from a pre-existing index, and especially a contracted version of the de Bruijn graph, where non branching paths are condensed into single nodes. Here, we formalise the relationship between suffix trees/arrays and dBGs, and exhibit linear time algorithms for constructing the full or contracted de Bruijn graphs. Finally, we provide hints explaining why this bridge between indexes and dBGs enables to dynamically update the order $k$ of the graph.
Keywords :
Type de document :
Rapport
[Research Report] RR-14004, LIRMM. 2014
Domaine :

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00950983
Contributeur : Eric Rivals <>
Soumis le : mercredi 5 mars 2014 - 19:31:18
Dernière modification le : vendredi 9 juin 2017 - 10:40:54
Document(s) archivé(s) le : jeudi 5 juin 2014 - 12:00:27

### Fichier

STSA-DBG-RR.pdf
Fichiers produits par l'(les) auteur(s)

### Identifiants

• HAL Id : lirmm-00950983, version 2

### Citation

Bastien Cazaux, Thierry Lecroq, Eric Rivals. From Indexing Data Structures to de Bruijn Graphs. [Research Report] RR-14004, LIRMM. 2014. <lirmm-00950983v2>

Consultations de
la notice

## 736

Téléchargements du document