Hardness of Optimal Spaced Seed Design

François Nicolas 1 Eric Rivals 1, *
* Auteur correspondant
1 MAB - Méthodes et Algorithmes pour la Bioinformatique
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : Speeding up approximate pattern matching is a line of research in stringology since the 80s. Practically fast approaches belong to the class of filtration algorithms, in which text regions dissimilar to the pattern are first excluded, and the remaining regions are then compared to the pattern by dynamic programming. Among the conditions used to test similarity between the regions and the pattern, many require a minimum number of common substrings between them. When only substitutions are taken into account for measuring dissimilarity, counting spaced subwords instead of substrings improves the filtration efficiency. However, a preprocessing step is required to design one or more patterns, called spaced seeds (or gapped seeds), for the subwords, depending on the search parameters. Two distinct lines of research appear the literature: one with probabilistic formulations of seed design problems, in which one wishes for instance to compute a seed with the highest probability to detect the desired similarities (lossy filtration), a second line with combinatorial formulations, where the goal is to find a seed that detects all or a maximum number of similarities (both lossless and lossy filtration). We concentrate on combinatorial seed design problems and consider formulations in which the set of sought similarities is either listed explicitly (RSOS), or characterised by their length and maximal number of mismatches (Non-Detection). Several articles exhibit exponential algorithms for these problems. In this work, we provide hardness and inapproximability results for several seed design problems, thereby justifying the complexity of known algorithms. Moreover, we introduce a new formulation of seed design (MWLS), in which the weight of the seed has to be maximised, and show it is as difficult to approximate as Maximum Independent Set.
Liste complète des métadonnées

Littérature citée [28 références]  Voir  Masquer  Télécharger

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00194171
Contributeur : Eric Rivals <>
Soumis le : mercredi 5 décembre 2007 - 18:32:10
Dernière modification le : jeudi 24 mai 2018 - 15:59:22
Document(s) archivé(s) le : lundi 12 avril 2010 - 06:18:04

Identifiants

Collections

Citation

François Nicolas, Eric Rivals. Hardness of Optimal Spaced Seed Design. Journal of Computer and System Sciences, Elsevier, 2008, 74 (5), pp.831-849. 〈10.1016/j.jcss.2007.10.001〉. 〈lirmm-00194171〉

Partager

Métriques

Consultations de la notice

230

Téléchargements de fichiers

105