Incremental computation of the set of period sets
Abstract
Overlaps between words are crucial in many areas of computer science, such as code design, stringology, and bioinformatics. A self overlapping word is characterized by its periods and borders. A period of a word u is the starting position of a suffix of u that is also a prefix u, and such a suffix is called a border. Each word of length, say n > 0, has a set of periods, but not all combinations of integers are sets of periods. Computing the period set of a word u takes linear time in the length of u. We address the question of computing, the set, denoted Γ_n , of all period sets of words of length n. Although period sets have been characterized, there is no formula to compute the cardinality of Γ_n (which is exponential in n), and the known dynamic programming algorithm to enumerate Γ_n suffers from its space complexity. We present an incremental approach to compute Γ_n from Γ_{n-1} , which reduces the space complexity, and then a constructive certification algorithm useful for verification purposes. The incremental approach defines a parental relation between sets in Γ_{n-1} and Γ_n , enabling one to investigate the dynamics of period sets, and their intriguing statistical properties. Moreover, the period set of a word u is key for computing the absence probability of u in random texts. Thus, knowing Γ_n is useful to assess the significance of word statistics, such as the number of missing words in a random text.
Fichier principal
2024_seqbim_incremental_gamma.pdf (946)
Télécharger le fichier
incremental_gamma_seqbim_2025.pdf (112)
Télécharger le fichier
Origin | Files produced by the author(s) |
---|---|
Licence |
Origin | Files produced by the author(s) |
---|---|
Licence |