On the Distribution of the Number of Missing Words in Random Texts

Sven Rahmann; Eric Rivals

doi:10.1017/S0963548302005473

Article Dans Une Revue Combinatorics, Probability and Computing Année : 2003

On the Distribution of the Number of Missing Words in Random Texts

(1) , (2)

1
2

Sven Rahmann

Fonction : Auteur

Department Computational Molecular Biology [MPIMG Berlin]

Eric Rivals

Fonction : Auteur
PersonId : 2002
IdHAL : eric-rivals
ORCID : 0000-0003-3791-3973
IdRef : 118021850

Méthodes et Algorithmes pour la Bioinformatique

Résumé

Determining the distribution of the number of empty urns after a number of balls have been thrown randomly into the urns is a classical and well understood problem. We study a generalization: Given a finite alphabet of size σ and a word length q, what is the distribution of the number X of words (of length q) that do not occur in a random text of length n+q−1 over the given alphabet? For q=1, X is the number Y of empty urns with σ urns and n balls. For q[gt-or-equal, slanted]2, X is related to the number Y of empty urns with σq urns and n balls, but the law of X is more complicated because successive words in the text overlap. We show that, perhaps surprisingly, the laws of X and Y are not as different as one might expect, but some problems remain currently open.

Domaines

Automatique / Robotique

Christine Carvalho De Matos : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00269581

Soumis le : jeudi 3 avril 2008-08:21:53

Dernière modification le : vendredi 24 mars 2023-14:52:50

Dates et versions

lirmm-00269581 , version 1 (03-04-2008)

Identifiants

HAL Id : lirmm-00269581 , version 1
DOI : 10.1017/S0963548302005473

Citer

Sven Rahmann, Eric Rivals. On the Distribution of the Number of Missing Words in Random Texts. Combinatorics, Probability and Computing, 2003, 12 (1), pp.73-87. ⟨10.1017/S0963548302005473⟩. ⟨lirmm-00269581⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS MAB LIRMM TDS-MACS MIPS UNIV-MONTPELLIER

55 Consultations

0 Téléchargements

On the Distribution of the Number of Missing Words in Random Texts

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager