3-Shortest Superstring is 2-approximable by a greedy algorithm

Bastien Cazaux; Eric Rivals

Rapport (Rapport De Recherche) Année : 2014

3-Shortest Superstring is 2-approximable by a greedy algorithm

(1, 2) , (1, 2)

1
2

Bastien Cazaux

Fonction : Auteur
PersonId : 3129
IdHAL : cazaux-bastien
IdRef : 226877523

Institut de Biologie Computationnelle

Méthodes et Algorithmes pour la Bioinformatique

Eric Rivals

Fonction : Auteur correspondant
PersonId : 2002
IdHAL : eric-rivals
ORCID : 0000-0003-3791-3973
IdRef : 118021850

Connectez-vous pour contacter l'auteur

Institut de Biologie Computationnelle

Méthodes et Algorithmes pour la Bioinformatique

Résumé

A superstring of a set of words is a string that contains each input word as a sub-string. Given such a set, the Shortest Superstring Problem (SSP) asks for a super-string of minimum length. SSP is an important theoretical problem related to the Asymmetric Travelling Salesman Problem, and also has practical applications in data compression and in bioinformatics. Indeed, it models the question of assembling a genome from a set of sequencing reads. Unfortunately, SSP is known to be NP-hard even on a binary alphabet and also hard to approximate with respect to the superstring length or to the compression achieved by the superstring. Even the variant in which all words share the same length r, called r-SSP, is NP-hard whenever r > 2. Numerous involved approximation algorithms achieve approximation ratio above 2 for the superstring, but remain difficult to implement in practice. In contrast the greedy conjecture asked in 1988 whether a simple greedy agglomeration algorithm achieves ratio of 2 for SSP. Here, we present a novel approach to bound the superstring approximation ratio with the compression ratio, which leads to a first proof of the greedy conjecture for 3-SSP.

Mots clés

Approximation algorithm Shortest Common Superstring Problem stringology data compression assembly greedy conjecture

Domaines

Bio-informatique [q-bio.QM] Algorithme et structure de données [cs.DS]

Fichier principal

approx-kSCS-RR.pdf (183.17 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Eric Rivals : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01070596

Soumis le : mercredi 1 octobre 2014-17:36:35

Dernière modification le : vendredi 24 mars 2023-14:52:59

Archivage à long terme le : vendredi 14 avril 2017-14:08:02

Dates et versions

lirmm-01070596 , version 1 (01-10-2014)

Identifiants

HAL Id : lirmm-01070596 , version 1
PRODINRA : 314005

Citer

Bastien Cazaux, Eric Rivals. 3-Shortest Superstring is 2-approximable by a greedy algorithm. [Research Report] RR-14009, LIRMM. 2014. ⟨lirmm-01070596⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA INRA MAB LIRMM LARA MIPS UNIV-MONTPELLIER INRAE ANR

840 Consultations

299 Téléchargements

3-Shortest Superstring is 2-approximable by a greedy algorithm

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager