From Indexing Data Structures to de Bruijn Graphs

Bastien Cazaux; Thierry Lecroq; Eric Rivals

Rapport (Rapport De Recherche) Année : 2014

From Indexing Data Structures to de Bruijn Graphs

(1) , (2) , (3, 1)

1
2
3

Bastien Cazaux

Fonction : Auteur
PersonId : 3129
IdHAL : cazaux-bastien
IdRef : 226877523

Méthodes et Algorithmes pour la Bioinformatique

Thierry Lecroq

Fonction : Auteur
PersonId : 21817
IdHAL : thierry-lecroq
ORCID : 0000-0002-1900-3397
IdRef : 059058463

Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes

Eric Rivals

Fonction : Auteur correspondant
PersonId : 2002
IdHAL : eric-rivals
ORCID : 0000-0003-3791-3973
IdRef : 118021850

Connectez-vous pour contacter l'auteur

Institut de Biologie Computationnelle

Méthodes et Algorithmes pour la Bioinformatique

Résumé

New technologies have tremendously increased sequencing throughput compared to traditional techniques, thereby complicating DNA assembly. Hence, assembly programs resort to de Bruijn graphs (dBG) of $k$-mers of short reads to compute a set of long contigs, each being a putative segment of the sequenced molecule. Other types of DNA sequence analysis, as well as preprocessing of the reads for assembly, use classical data structures to index all substrings of the reads. It is thus interesting to exhibit algorithms that directly build a de Bruijn graph of order $k$ from a pre-existing index, and especially a contracted version of the de Bruijn graph, where non branching paths are condensed into single nodes. Here, we formalise the relationship between suffix trees/arrays and dBGs, and exhibit linear time algorithms for constructing the full or contracted de Bruijn graphs. Finally, we provide hints explaining why this bridge between indexes and dBGs enables to dynamically update the order $k$ of the graph.

Mots clés

suffix tree suffix array assembly index data structures dynamic update k-mer overlap contracted de Bruijn graph construction algorithms

Domaines

Bio-informatique [q-bio.QM] Algorithme et structure de données [cs.DS]

Fichier principal

STSA-DBG-RR.pdf (155.56 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Eric Rivals : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00950983

Soumis le : mercredi 5 mars 2014-19:31:18

Dernière modification le : vendredi 22 décembre 2023-15:16:05

Archivage à long terme le : jeudi 5 juin 2014-12:00:27

Dates et versions

lirmm-00950983 , version 1 (25-02-2014)

lirmm-00950983 , version 2 (05-03-2014)

Identifiants

HAL Id : lirmm-00950983 , version 2

Citer

Bastien Cazaux, Thierry Lecroq, Eric Rivals. From Indexing Data Structures to de Bruijn Graphs. [Research Report] RR-14004, LIRMM. 2014. ⟨lirmm-00950983v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA INSA-ROUEN INRA LITIS MAB LIRMM COMUE-NORMANDIE LMI-ROUEN LARA MIPS UNIV-MONTPELLIER UNIROUEN UNILEHAVRE INSA-GROUPE INRAE ANR

983 Consultations

736 Téléchargements

From Indexing Data Structures to de Bruijn Graphs

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager