Incremental Mining of Sequential Patterns in Large Databases

In recent years the emergence of new real-world applications such as network traffic monitoring, intrusion detection systems, sensor network data analysis, click stream mining and dynamic tracing of financial transactions, calls for studying a new kind of model. Named data stream, this model is in fact a continuous and potentially infinite flow of information as opposed to finite and statically stored data sets. We study the problem of sequential pattern mining in data streams. This problem has been extensively studied for the conventional case of disk resident data sets. In the case of data streams, this problem becomes more challenging as the volume of data is usually too huge to be stored on permanent devices, main memory or to be scanned thoroughly more than once. In this case, it may be acceptable to generate approximable solutions for our mining problem. In this paper we introduce a new approach based on biased reservoir sampling to achieve a more efficient mining of sequential patterns. Furthermore, we theoretically prove that our biased reservoir size is always bounded whatever the size of the stream is. This property often allows us to keep the entire relevant reservoir in main memory. We also show a simple algorithm to build the biased reservoir for the special case of sequential pattern mining. Experimental evaluation supports the claim that sequential pattern mining based on biased reservoir sampling needs small memory requirements. Besides, we also propose an adapted approach to handle the case of mining sequential patterns in a sliding window model. The experiment show that the results are accurate.

Mots clés

Sequential patterns Incremental mining Data mining

Domaines

Base de données [cs.DB]

Fichier principal

ise.pdf (349.45 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Christine Carvalho De Matos : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00269547

Soumis le : samedi 3 novembre 2018-22:42:11

Dernière modification le : mardi 17 septembre 2024-16:03:05

Archivage à long terme le : lundi 4 février 2019-12:56:53

Dates et versions

lirmm-00269547 , version 1 (03-11-2018)

Identifiants

HAL Id : lirmm-00269547 , version 1
DOI : 10.1016/S0169-023X(02)00209-4

Citer

Florent Masseglia, Pascal Poncelet, Maguelonne Teisseire. Incremental Mining of Sequential Patterns in Large Databases. Data and Knowledge Engineering, 2003, 46 (1), pp.97-121. ⟨10.1016/S0169-023X(02)00209-4⟩. ⟨lirmm-00269547⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS EM-ALES IRISA LIRMM UR1-MATH-STIC UR1-UFR-ISTIC MIPS UNIV-MONTPELLIER UNIV-RENNES UR1-MATH-NUM INSTITUT-MINES-TELECOM

197 Consultations

280 Téléchargements