On the Distribution of the Number of Missing Words in Random Texts - LIRMM - Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier
Journal Articles Combinatorics, Probability and Computing Year : 2003

On the Distribution of the Number of Missing Words in Random Texts

Abstract

Determining the distribution of the number of empty urns after a number of balls have been thrown randomly into the urns is a classical and well understood problem. We study a generalization: Given a finite alphabet of size σ and a word length q, what is the distribution of the number X of words (of length q) that do not occur in a random text of length n+q−1 over the given alphabet? For q=1, X is the number Y of empty urns with σ urns and n balls. For q[gt-or-equal, slanted]2, X is related to the number Y of empty urns with σq urns and n balls, but the law of X is more complicated because successive words in the text overlap. We show that, perhaps surprisingly, the laws of X and Y are not as different as one might expect, but some problems remain currently open.

Domains

Automatic

Dates and versions

lirmm-00269581 , version 1 (03-04-2008)

Identifiers

Cite

Sven Rahmann, Eric Rivals. On the Distribution of the Number of Missing Words in Random Texts. Combinatorics, Probability and Computing, 2003, 12 (1), pp.73-87. ⟨10.1017/S0963548302005473⟩. ⟨lirmm-00269581⟩
56 View
0 Download

Altmetric

Share

More