Combinatorics of Periods in Strings

Eric Rivals

Communication Dans Un Congrès Année : 2007

Combinatorics of Periods in Strings

(1)

Eric Rivals

Fonction : Auteur
PersonId : 2002
IdHAL : eric-rivals
ORCID : 0000-0003-3791-3973
IdRef : 118021850

Méthodes et Algorithmes pour la Bioinformatique

Résumé

Here, we consider a central notion of word combinatorics and string algorithmics: the periods of a string. A period is an offset (i.e., a shift) at which a word can overlap itself. A word may have several periods, which we call its set of periods, and distinct words of the same length may share the same period set. When denoted by a binary string, a period set is called the autocorrelation of a word. In the early 80's, Guibas and Odlyzko provided the first investigation of the structure of period sets [3, 2] and characterized them. Considering the set Gn of all period sets of strings of length n over a finite alphabet, they showed that Gn is independent of the alphabet (provided the cardinality of S >= 2). Pursuing the goal of finding an enumeration algorithm for Gn, we study further the properties of Gn and exhibit the redundancy in period sets. It enables us to introduce the notion of an irreducible period set and to elucidate the structure of both Gn and the set of all irreducible period sets, denoted Ln. We then propose the first efficient enumeration algorithm for Gn. We also exhibit a relation between the number of binary partitions of n and the number of distinct period sets (i.e., the cardinality of Gn). It allows us to improve upon the previously known asymptotic lower bounds on the cardinality of Gn [3]. Additionally, from these results we derive a new recurrence to compute the population of a period set, as well as an algorithm to sample uniformly irreducible and classical period sets. All above mentioned results were published in [6, 7]. Related entries of the Encyclopedia of Integer Sequences [8] are A018819 and A000123. This study has been extended to partial words [1]. The enumeration algorithm found applications for the computation of several statistics about the vocabulary of strings, like the number of missing words of length n in a text or the number of common words between two texts [4, 5].

Mots clés

Period Enumeration Irreducible Period Set Word Combinatorics

Domaines

Combinatoire [math.CO] Traitement du texte et du document Algorithme et structure de données [cs.DS]

Fichier principal

Rivals-periods-WAW-2007.pdf (10.89 Ko)

Eric Rivals : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00193520

Soumis le : lundi 3 décembre 2007-18:37:52

Dernière modification le : jeudi 18 avril 2024-14:46:45

Archivage à long terme le : lundi 12 avril 2010-05:55:32

Dates et versions

lirmm-00193520 , version 1 (03-12-2007)

Identifiants

HAL Id : lirmm-00193520 , version 1

Citer

Eric Rivals. Combinatorics of Periods in Strings. Workshop on Algorithms on Words, Mar 2007, Turku, Finland. pp.43-44. ⟨lirmm-00193520⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS MAB LIRMM MIPS UNIV-MONTPELLIER

140 Consultations

298 Téléchargements

Combinatorics of Periods in Strings

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager