From Indexing Data Structures to de Bruijn Graphs

Bastien Cazaux; Thierry Lecroq; Eric Rivals

doi:10.1007/978-3-319-07566-2_10

Communication Dans Un Congrès Année : 2014

From Indexing Data Structures to de Bruijn Graphs

(1, 2) , (3) , (2, 1)

1
2
3

Bastien Cazaux

Fonction : Auteur
PersonId : 3129
IdHAL : cazaux-bastien
IdRef : 226877523

Méthodes et Algorithmes pour la Bioinformatique

Institut de Biologie Computationnelle

Thierry Lecroq

Fonction : Auteur
PersonId : 21817
IdHAL : thierry-lecroq
ORCID : 0000-0002-1900-3397
IdRef : 059058463

Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes

Eric Rivals

Fonction : Auteur
PersonId : 2002
IdHAL : eric-rivals
ORCID : 0000-0003-3791-3973
IdRef : 118021850

Institut de Biologie Computationnelle

Méthodes et Algorithmes pour la Bioinformatique

Résumé

New technologies have tremendously increased sequencing throughput com-pared to traditional techniques, thereby complicating DNA assembly. Hence, as-sembly programs resort to de Bruijn graphs (dBG) of k-mers of short reads to compute a set of long contigs, each being a putative segment of the sequenced molecule. Other types of DNA sequence analysis, as well as preprocessing of the reads for assembly, use classical data structures to index all substrings of the reads. It is thus interesting to exhibit algorithms that directly build a dBG of order k from a pre-existing index, and especially a contracted version of the dBG, where non branching paths are condensed into single nodes. Here, we formalise the relation-ship between suffix trees/arrays and dBGs, and exhibit linear time algorithms for constructing the full or contracted dBGs. Finally, we provide hints explaining why this bridge between indexes and dBGs enables to dynamically update the order k of the graph.

Mots clés

suffix tree suffix array assembly index data structures dynamic update k-mer overlap contracted de Bruijn graph construction algorithms

Domaines

Bio-informatique [q-bio.QM] Informatique [cs] Algorithme et structure de données [cs.DS]

Fichier principal

cpm-dbg-auteurs.pdf (192.71 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Eric Rivals : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01081429

Soumis le : vendredi 7 novembre 2014-17:47:40

Dernière modification le : vendredi 22 décembre 2023-15:16:05

Archivage à long terme le : dimanche 8 février 2015-11:11:43

Dates et versions

lirmm-01081429 , version 1 (07-11-2014)

Licence

Paternité - Pas d'utilisation commerciale - Pas de modification

Identifiants

HAL Id : lirmm-01081429 , version 1
DOI : 10.1007/978-3-319-07566-2_10

Citer

Bastien Cazaux, Thierry Lecroq, Eric Rivals. From Indexing Data Structures to de Bruijn Graphs. CPM: Combinatorial Pattern Matching, Moscow State University, Jun 2014, Moscow, Russia. pp.89-99, ⟨10.1007/978-3-319-07566-2_10⟩. ⟨lirmm-01081429⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA INSA-ROUEN INRA LITIS MAB LIRMM COMUE-NORMANDIE MIPS UNIV-MONTPELLIER UNIROUEN UNILEHAVRE INSA-GROUPE INRAE ANR

740 Consultations

247 Téléchargements

From Indexing Data Structures to de Bruijn Graphs

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager