Combinatorics of Periods in Strings

Eric Rivals 1
1 MAB - Méthodes et Algorithmes pour la Bioinformatique
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : Here, we consider a central notion of word combinatorics and string algorithmics: the periods of a string. A period is an offset (i.e., a shift) at which a word can overlap itself. A word may have several periods, which we call its set of periods, and distinct words of the same length may share the same period set. When denoted by a binary string, a period set is called the autocorrelation of a word. In the early 80's, Guibas and Odlyzko provided the first investigation of the structure of period sets [3, 2] and characterized them. Considering the set Gn of all period sets of strings of length n over a finite alphabet, they showed that Gn is independent of the alphabet (provided the cardinality of S >= 2). Pursuing the goal of finding an enumeration algorithm for Gn, we study further the properties of Gn and exhibit the redundancy in period sets. It enables us to introduce the notion of an irreducible period set and to elucidate the structure of both Gn and the set of all irreducible period sets, denoted Ln. We then propose the first efficient enumeration algorithm for Gn. We also exhibit a relation between the number of binary partitions of n and the number of distinct period sets (i.e., the cardinality of Gn). It allows us to improve upon the previously known asymptotic lower bounds on the cardinality of Gn [3]. Additionally, from these results we derive a new recurrence to compute the population of a period set, as well as an algorithm to sample uniformly irreducible and classical period sets. All above mentioned results were published in [6, 7]. Related entries of the Encyclopedia of Integer Sequences [8] are A018819 and A000123. This study has been extended to partial words [1]. The enumeration algorithm found applications for the computation of several statistics about the vocabulary of strings, like the number of missing words of length n in a text or the number of common words between two texts [4, 5].
Type de document :
Communication dans un congrès
Vesa Halava. Workshop on Algorithms on Words, Mar 2007, Turku, Finland. pp.43-44, 2007, 〈http://www.math.utu.fi/projects/waw/〉
Liste complète des métadonnées

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00193520
Contributeur : Eric Rivals <>
Soumis le : lundi 3 décembre 2007 - 18:37:52
Dernière modification le : jeudi 24 mai 2018 - 15:59:22
Document(s) archivé(s) le : lundi 12 avril 2010 - 05:55:32

Identifiants

  • HAL Id : lirmm-00193520, version 1

Collections

Citation

Eric Rivals. Combinatorics of Periods in Strings. Vesa Halava. Workshop on Algorithms on Words, Mar 2007, Turku, Finland. pp.43-44, 2007, 〈http://www.math.utu.fi/projects/waw/〉. 〈lirmm-00193520〉

Partager

Métriques

Consultations de la notice

278

Téléchargements de fichiers

156