Building the Assembly De Bruijn Graph from an Implicit Suffix Tree - LIRMM - Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier Access content directly
Conference Papers Year : 2013

Building the Assembly De Bruijn Graph from an Implicit Suffix Tree

Abstract

Suffix trees belong to the most studied indexing data structures for strings. Generalised suffix trees can index the substrings of a set of input words. In its construction algorithm Ukkonen used implicit suffix trees, which relax the assumption that a suffix is not a prefix of another suffix, and thus represent some suffixes by internal nodes rather than by leaves. In computational biology, for the sake of genome assembly, one wishes to assemble a target sequence from a multitude of input strings, usually called the reads. Numerous algorithms build assembly related graphs that represent the overlaps of the input reads: for instance, the overlap graph, or a special version of the De Bruijn graph, in which only k-mers occurring within the reads are represented by nodes. We address the question of algorithms for building the overlap graph and the De Bruijn graph directly from indexing data structures of the read set, and investigate it when the index in an implicit generalised suffix tree. We will present the algorithms for the De Bruijn graph and for its contracted version, and show they run in a time that is linear in the size of the index. Generally, this work establishes algorithmic paths to convert classical indexing data structures in graph representations useful for assembly. Work in collaboration with Bastien Cazaux et Thierry Lecroq.
Fichier principal
Vignette du fichier
pan-genome-Rivals-res.pdf (19.38 Ko) Télécharger le fichier
Origin Files produced by the author(s)

Dates and versions

lirmm-00938959 , version 1 (29-01-2014)

Identifiers

  • HAL Id : lirmm-00938959 , version 1

Cite

Eric Rivals. Building the Assembly De Bruijn Graph from an Implicit Suffix Tree. Mini-Workshop on the Storage, Search and Annotation of Multiple Similar Genomes, Dec 2013, Bielefeld, Germany. ⟨lirmm-00938959⟩
599 View
334 Download

Share

Gmail Mastodon Facebook X LinkedIn More