Accurate self-correction of errors in long reads using de Bruijn graphs

Leena Salmela; Riku Walve; Eric Rivals; Esko Ukkonen

doi:10.1093/bioinformatics/btw321

Article Dans Une Revue Bioinformatics Année : 2017

Accurate self-correction of errors in long reads using de Bruijn graphs

(1) , (1) , (2, 3) , (1)

1
2
3

Leena Salmela

Fonction : Auteur
PersonId : 1273076
ORCID : 0000-0002-0756-543X

Helsinki Institute for Information Technology

Riku Walve

Fonction : Auteur

Helsinki Institute for Information Technology

Eric Rivals

Fonction : Auteur
PersonId : 2002
IdHAL : eric-rivals
ORCID : 0000-0003-3791-3973
IdRef : 118021850

Méthodes et Algorithmes pour la Bioinformatique

Institut de Biologie Computationnelle

Esko Ukkonen

Fonction : Auteur

Helsinki Institute for Information Technology

Résumé

Motivation: New long read sequencing technologies, like PacBio SMRT and Oxford NanoPore, can produce sequencing reads up to 50 000 bp long but with an error rate of at least 15%. Reducing the error rate is necessary for subsequent utilization of the reads in, e.g. de novo genome assembly. The error correction problem has been tackled either by aligning the long reads against each other or by a hybrid approach that uses the more accurate short reads produced by second generation sequencing technologies to correct the long reads. Results: We present an error correction method that uses long reads only. The method consists of two phases: first, we use an iterative alignment-free correction method based on de Bruijn graphs with increasing length of k-mers, and second, the corrected reads are further polished using long-distance dependencies that are found using multiple alignments. According to our experiments, the proposed method is the most accurate one relying on long reads only for read sets with high coverage. Furthermore, when the coverage of the read set is at least 75Â, the throughput of the new method is at least 20% higher. Availability and Implementation: LoRMA is freely available at http://www.cs.helsinki.fi/u/lmsalmel/LoRMA/.

Mots clés

substitution Sequence analysis PacBio DNA NGS de Bruijn assembly LoRDEC non hybrid correction indel

Domaines

Bio-informatique [q-bio.QM]

Fichier principal

Bioinformatics-2016-Salmela-bioinformatics_btw321.pdf (285.64 Ko)

Origine	Publication financée par une institution

Eric Rivals : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01385006

Soumis le : jeudi 20 octobre 2016-16:09:46

Dernière modification le : lundi 30 octobre 2023-13:50:04

Dates et versions

lirmm-01385006 , version 1 (20-10-2016)

Identifiants

HAL Id : lirmm-01385006 , version 1
DOI : 10.1093/bioinformatics/btw321
PUBMED : 27273673

Citer

Leena Salmela, Riku Walve, Eric Rivals, Esko Ukkonen. Accurate self-correction of errors in long reads using de Bruijn graphs. Bioinformatics, 2017, 33 (6), pp.799-806. ⟨10.1093/bioinformatics/btw321⟩. ⟨lirmm-01385006⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA INRA MAB LIRMM MIPS UNIV-MONTPELLIER INRAE ANR

202 Consultations

323 Téléchargements

Accurate self-correction of errors in long reads using de Bruijn graphs

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager