Building the Assembly De Bruijn Graph from an Implicit Suffix Tree

Eric Rivals 1, 2, *
* Corresponding author
2 MAB - Méthodes et Algorithmes pour la Bioinformatique
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : Suffix trees belong to the most studied indexing data structures for strings. Generalised suffix trees can index the substrings of a set of input words. In its construction algorithm Ukkonen used implicit suffix trees, which relax the assumption that a suffix is not a prefix of another suffix, and thus represent some suffixes by internal nodes rather than by leaves. In computational biology, for the sake of genome assembly, one wishes to assemble a target sequence from a multitude of input strings, usually called the reads. Numerous algorithms build assembly related graphs that represent the overlaps of the input reads: for instance, the overlap graph, or a special version of the De Bruijn graph, in which only k-mers occurring within the reads are represented by nodes. We address the question of algorithms for building the overlap graph and the De Bruijn graph directly from indexing data structures of the read set, and investigate it when the index in an implicit generalised suffix tree. We will present the algorithms for the De Bruijn graph and for its contracted version, and show they run in a time that is linear in the size of the index. Generally, this work establishes algorithmic paths to convert classical indexing data structures in graph representations useful for assembly. Work in collaboration with Bastien Cazaux et Thierry Lecroq.
Complete list of metadatas

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00938959
Contributor : Eric Rivals <>
Submitted on : Wednesday, January 29, 2014 - 5:29:32 PM
Last modification on : Friday, March 15, 2019 - 1:15:01 AM
Long-term archiving on : Sunday, April 9, 2017 - 2:43:01 AM

File

pan-genome-Rivals-res.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : lirmm-00938959, version 1

Collections

Citation

Eric Rivals. Building the Assembly De Bruijn Graph from an Implicit Suffix Tree. Mini-Workshop on the Storage, Search and Annotation of Multiple Similar Genomes, Dec 2013, Bielefeld, Germany. ⟨lirmm-00938959⟩

Share

Metrics

Record views

1146

Files downloads

570