Practical lower and upper bounds for the Shortest Linear Superstring

Bastien Cazaux; Samuel Juhel; Eric Rivals

doi:10.4230/LIPIcs.SEA.2018.18

Communication Dans Un Congrès Année : 2018

Practical lower and upper bounds for the Shortest Linear Superstring

(1, 2) , (1, 2) , (1, 2)

1
2

Bastien Cazaux

Fonction : Auteur
PersonId : 3129
IdHAL : cazaux-bastien
IdRef : 226877523

Méthodes et Algorithmes pour la Bioinformatique

Institut de Biologie Computationnelle

Samuel Juhel

Fonction : Auteur
PersonId : 1039368

Méthodes et Algorithmes pour la Bioinformatique

Institut de Biologie Computationnelle

Eric Rivals

Fonction : Auteur
PersonId : 2002
IdHAL : eric-rivals
ORCID : 0000-0003-3791-3973
IdRef : 118021850

Méthodes et Algorithmes pour la Bioinformatique

Institut de Biologie Computationnelle

Résumé

Given a set P of words, the Shortest Linear Superstring (SLS) problem is an optimization problem that asks for a superstring of $P$ of minimal length. SLS has applications in data compression, where a superstring is a compact representation of $P$, and in bioinformatics where it models the first step of genome assembly. Unfortunately SLS is hard to solve (NP-hard) and to closely approximate (MAX-SNP-hard). If numerous polynomial time approximation algorithms have been devised, few articles report on their practical performance. We lack knowledge about how closely an approximate superstring can be from an optimal one in practice. Here, we exhibit a linear time algorithm that reports an upper and a lower bound on the length of an optimal superstring. The upper bound is the length of a superstring. This algorithm can be used to evaluate beforehand whether one can get an approximate superstring whose length is close to the optimum for a given instance. Experimental results suggest that its approximation performance is orders of magnitude better than previously reported practical values. Moreover, the proposed algorithm remain efficient even on large instances and can serve to explore in practice the approximability of SLS.

Mots clés

Greedy Approximation Overlap Concat-Cycles Cyclic cover Linear time Text compression

Domaines

Algorithme et structure de données [cs.DS] Mathématique discrète [cs.DM] Bio-informatique [q-bio.QM]

Fichier principal

sls-bounds-sea-2018.pdf (597.8 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Eric Rivals : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01929399

Soumis le : mercredi 21 novembre 2018-10:48:40

Dernière modification le : vendredi 24 mars 2023-14:53:08

Archivage à long terme le : vendredi 22 février 2019-13:02:58

Dates et versions

lirmm-01929399 , version 1 (21-11-2018)

Licence

Paternité

Identifiants

HAL Id : lirmm-01929399 , version 1
DOI : 10.4230/LIPIcs.SEA.2018.18

Citer

Bastien Cazaux, Samuel Juhel, Eric Rivals. Practical lower and upper bounds for the Shortest Linear Superstring. SEA: Symposium on Experimental Algorithms, Jun 2018, L'Aquilla, Italy. pp.18:1--18:14, ⟨10.4230/LIPIcs.SEA.2018.18⟩. ⟨lirmm-01929399⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA INRA MAB LIRMM TDS-MACS MIPS UNIV-MONTPELLIER INRAE ANR

216 Consultations

183 Téléchargements

Practical lower and upper bounds for the Shortest Linear Superstring

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager