Superstring Graph: A New Approach for Genome Assembly

Abstract : With the increasing impact of genomics in life sciences, the inference of high quality, reliable, and complete genome sequences is becoming critical. Genome assembly remains a major bottleneck in bioin-formatics: indeed, high throughput sequencing apparatus yield millions of short sequencing reads that need to be merged based on their overlaps. Overlap graph based algorithms were used with the first generation of sequencers, while de Bruijn graph (DBG) based methods were preferred for the second generation. Because the sequencing coverage varies locally along the molecule, state-of-the-art assembly programs now follow an iterative process that requires the construction of de Bruijn graphs of distinct orders (i.e., sizes of the overlaps). The set of resulting sequences, termed unitigs, provide an important improvement compared to single DBG approaches. Here, we present a novel approach based on a digraph, the Superstring Graph, that captures all desired sizes of overlaps at once and allows to discard unreliable overlaps. With a simple algorithm, the Superstring Graph delivers sequences that includes all the unitigs obtained from multiple DBG as substrings. In linear time and space, it combines the efficiency of a greedy approach to the advantages of using a single graph. In summary, we present a first and formal comparison of the output of state-of-the-art genome assemblers.
Type de document :
Communication dans un congrès
Riccardo Dondi, Guillaume Fertin, Giancarlo Mauri Eleventh International Conference on Algorithmic Aspects in Information and Management, Jul 2016, Bergamo, Italy. Springer International Publishing AG Switzerland, Lecture Notes in Computer Science, 9778, pp.39 - 52, 2016, Eleventh International Conference on Algorithmic Aspects in Information and Management. 〈https://aaim2016.wordpress.com/〉. 〈10.1007/978-3-319-41168-2_4〉
Liste complète des métadonnées

Littérature citée [14 références]  Voir  Masquer  Télécharger

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01446428
Contributeur : Eric Rivals <>
Soumis le : mercredi 25 janvier 2017 - 22:20:31
Dernière modification le : jeudi 28 juin 2018 - 14:36:02
Document(s) archivé(s) le : mercredi 26 avril 2017 - 18:29:13

Fichier

Cazaux-etal-assembly-superstri...
Publication financée par une institution

Identifiants

Citation

Bastien Cazaux, Gustavo Sacomoto, Eric Rivals. Superstring Graph: A New Approach for Genome Assembly. Riccardo Dondi, Guillaume Fertin, Giancarlo Mauri Eleventh International Conference on Algorithmic Aspects in Information and Management, Jul 2016, Bergamo, Italy. Springer International Publishing AG Switzerland, Lecture Notes in Computer Science, 9778, pp.39 - 52, 2016, Eleventh International Conference on Algorithmic Aspects in Information and Management. 〈https://aaim2016.wordpress.com/〉. 〈10.1007/978-3-319-41168-2_4〉. 〈lirmm-01446428〉

Partager

Métriques

Consultations de la notice

264

Téléchargements de fichiers

258