Combinatorics of Periods in Strings - LIRMM - Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier
Communication Dans Un Congrès Année : 2007

Combinatorics of Periods in Strings

Résumé

Here, we consider a central notion of word combinatorics and string algorithmics: the periods of a string. A period is an offset (i.e., a shift) at which a word can overlap itself. A word may have several periods, which we call its set of periods, and distinct words of the same length may share the same period set. When denoted by a binary string, a period set is called the autocorrelation of a word. In the early 80's, Guibas and Odlyzko provided the first investigation of the structure of period sets [3, 2] and characterized them. Considering the set Gn of all period sets of strings of length n over a finite alphabet, they showed that Gn is independent of the alphabet (provided the cardinality of S >= 2). Pursuing the goal of finding an enumeration algorithm for Gn, we study further the properties of Gn and exhibit the redundancy in period sets. It enables us to introduce the notion of an irreducible period set and to elucidate the structure of both Gn and the set of all irreducible period sets, denoted Ln. We then propose the first efficient enumeration algorithm for Gn. We also exhibit a relation between the number of binary partitions of n and the number of distinct period sets (i.e., the cardinality of Gn). It allows us to improve upon the previously known asymptotic lower bounds on the cardinality of Gn [3]. Additionally, from these results we derive a new recurrence to compute the population of a period set, as well as an algorithm to sample uniformly irreducible and classical period sets. All above mentioned results were published in [6, 7]. Related entries of the Encyclopedia of Integer Sequences [8] are A018819 and A000123. This study has been extended to partial words [1]. The enumeration algorithm found applications for the computation of several statistics about the vocabulary of strings, like the number of missing words of length n in a text or the number of common words between two texts [4, 5].
Fichier principal
Vignette du fichier
Rivals-periods-WAW-2007.pdf (10.89 Ko) Télécharger le fichier

Dates et versions

lirmm-00193520 , version 1 (03-12-2007)

Identifiants

  • HAL Id : lirmm-00193520 , version 1

Citer

Eric Rivals. Combinatorics of Periods in Strings. Workshop on Algorithms on Words, Mar 2007, Turku, Finland. pp.43-44. ⟨lirmm-00193520⟩
140 Consultations
298 Téléchargements

Partager

More