Skip to Main content Skip to Navigation
Journal articles

Efficient assembly consensus algorithms for divergent contig sets

Annie Chateau 1 Tom Davot 1 Manuel Lafond 2
1 MAB - Méthodes et Algorithmes pour la Bioinformatique
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : Assembly is a fundamental task in genome sequencing, and many assemblers have been made available in the last decade. Because of the wide range of possible choices, it can be hard to determine which tool or parameter to use for a specific genome sequencing project. In this paper, we propose a consensus approach that takes the best parts of several contigs datasets produced by different methods, and combines them into a better assembly. This amounts to orienting and ordering sets of contigs, which can be viewed as an optimization problem where the aim is to find an alignment of two fragmented strings that maximizes an arbitrary scoring function between matched characters. In this work, we investigate the computational complexity of this problem. We first show that it is NP-hard, even in an alphabet with only two symbols and with all scores being either 0 or 1. On the positive side, we propose an efficient, quadratic time algorithm that achieves approximation factor 3.
Complete list of metadata

https://hal-lirmm.ccsd.cnrs.fr/lirmm-03244191
Contributor : Tom Davot <>
Submitted on : Tuesday, June 1, 2021 - 9:58:09 AM
Last modification on : Saturday, June 5, 2021 - 3:25:59 AM
Long-term archiving on: : Thursday, September 2, 2021 - 6:16:12 PM

File

main.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Annie Chateau, Tom Davot, Manuel Lafond. Efficient assembly consensus algorithms for divergent contig sets. Computational Biology and Chemistry, Elsevier, 2021, 93, pp.#107516. ⟨10.1016/j.compbiolchem.2021.107516⟩. ⟨lirmm-03244191⟩

Share

Metrics

Record views

45

Files downloads

99