Communication Dans Un Congrès Année : 2025

Algorithms for the enumeration of b-canonical words

Algorithmes pour l'énumération de mots b-canoniques

Résumé

The border array is the data structure built during the preprocessing of a word in the Morris-Pratt pattern matching algorithm. It is a key data structure for string algorithms. Several articles have proposed algorithms to recognize a border array, that is to determine if an array of integers is the border array of a word. In 1999, Moore et al. defined the notion of b-equivalence between words of length n as follows: two words are b-equivalent if they share the same border array. The relation of b-equivalence is an equivalence relationship. A word is said to be b-canonical if it is the smallest of its b-equivalence class in lexicographic order. We propose a linear space algorithm to enumerate all (distinct) b-canonical words of length n; this improves on the algorithm of Moore et al. We also present a parallel version of this algorithm and evaluate its performance compared to the non parallel version. This work improves on the algorithm proposed by Moore et al and also answers related combinatorial questions, such as the number of words belonging to the b-equivalence class of a b-canonical word. Enumerating b-canonical words and their border arrays for a given length yields an optimal test set of instances for any algorithm computing border arrays. This allows to create unit tests with complete coverage and zero redundancy.

Fichier principal
Vignette du fichier
b_canonical_seqbim.pdf (148.45 Ko) Télécharger le fichier
rivals_seqbim_pres_2025.pdf (2.93 Mo) Télécharger le fichier
Format Présentation
Licence
Commentaire Presentation given at SeqBIM 2025 on November 24. 2025

Dates et versions

lirmm-05392211 , version 1 (02-12-2025)

Licence

Identifiants

  • HAL Id : lirmm-05392211 , version 1

Citer

Tristan Dubois, Eric Rivals. Algorithms for the enumeration of b-canonical words. SeqBIM 2025 - Séquences en Bioinformatique, Informatique et Mathématiques, Université de Nantes, Nov 2025, Nantes, France. ⟨lirmm-05392211⟩
78 Consultations
64 Téléchargements

Partager

  • More