On the Distribution of the Number of Missing Words in Random Texts

Sven Rahmann 1 Eric Rivals 2
2 MAB - Méthodes et Algorithmes pour la Bioinformatique
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : Determining the distribution of the number of empty urns after a number of balls have been thrown randomly into the urns is a classical and well understood problem. We study a generalization: Given a finite alphabet of size σ and a word length q, what is the distribution of the number X of words (of length q) that do not occur in a random text of length n+q−1 over the given alphabet? For q=1, X is the number Y of empty urns with σ urns and n balls. For q[gt-or-equal, slanted]2, X is related to the number Y of empty urns with σq urns and n balls, but the law of X is more complicated because successive words in the text overlap. We show that, perhaps surprisingly, the laws of X and Y are not as different as one might expect, but some problems remain currently open.
Type de document :
Article dans une revue
Combinatorics, Probability and Computing, Cambridge University Press (CUP), 2003, 12 (1), pp.73-87. 〈10.1017/S0963548302005473〉
Liste complète des métadonnées

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00269581
Contributeur : Christine Carvalho de Matos <>
Soumis le : jeudi 3 avril 2008 - 08:21:53
Dernière modification le : jeudi 11 janvier 2018 - 06:26:12

Identifiants

Collections

Citation

Sven Rahmann, Eric Rivals. On the Distribution of the Number of Missing Words in Random Texts. Combinatorics, Probability and Computing, Cambridge University Press (CUP), 2003, 12 (1), pp.73-87. 〈10.1017/S0963548302005473〉. 〈lirmm-00269581〉

Partager

Métriques

Consultations de la notice

39