Periodicity of Degenerate Strings - LIRMM - Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier
Conference Papers Year : 2023

Periodicity of Degenerate Strings

Abstract

The notion of periods is key in stringology, word combinatorics, and pattern matching algorithms. A string has period p if every two letters at distance p from each other are equal. There has been a growing interest in more general models of sequences which can describe uncertainty. An important model of sequences with uncertainty are degenerate strings. A degenerate string is a string with "undetermined" symbols, which can denote arbitrary subsets of the alphabet Σ. Degenerate strings have been extensively used to describe uncertainty in DNA, RNA, and protein sequences using the IUPAC code (Biochemistry, 1970). In this work, we extend the work of Blanchet-Sadri et al. (2010) to obtain the following results about the combinatorial aspects of periodicity for degenerate strings:

-We compare three natural generalizations of periodicity for degenerate strings, which we refer to as weak, medium and strong periodicity. We define the concept of total autocorrelations, which are quaternary vectors indicating these three notions of periodicity.

-We characterize the three families of period sets, as well as the family of total autocorrelations, for each alphabet size. In particular, we prove necessary conditions period sets should satisfy and, to prove sufficiency, we show how to construct a degenerate string which gives rise to particular period sets.

-For each notion of periodicity, we (asymptotically) count the number of period sets, by combining known techniques from partial words with recent results from number theory.

-Moreover, we show that all families of period sets, as well as the family of total autocorrelations, form lattices under a suitably defined partial ordering.

-We compute the population of weak, medium and strong period sets (i.e., the number of strings with that period set). We also compute the population of total autocorrelations.
Fichier principal
Vignette du fichier
Gabory_etl_PSC2023_article05.pdf (293.22 Ko) Télécharger le fichier
Origin Explicit agreement for this submission

Dates and versions

lirmm-04777487 , version 1 (12-11-2024)

Licence

Identifiers

  • HAL Id : lirmm-04777487 , version 1

Cite

Estéban Gabory, Eric Rivals, Michelle Sweering, Hilde Verbeek, Pengfei Wang. Periodicity of Degenerate Strings. PSC 2023 - 27th Prague Stringology Conference, Aug 2023, Prague, Czech Republic. pp.42-56. ⟨lirmm-04777487⟩
0 View
0 Download

Share

More