Sampling For Sequential Pattern Mining: From Static Databases to Data Streams

Chedy Raïssi 1 Pascal Poncelet 2
2 TATOO - Fouille de données environnementales
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : Sequential pattern mining is an active field in the domain of knowledge discovery. Recently, with the constant progress in hardware technologies, real-world databases tend to grow larger and the hypothesis that a database can be loaded into main-memory for sequential pattern mining purpose is no longer valid. Furthermore, the new model of data as a continuous and potentially infinite flow, known as data stream model, call for a pre-processing step to ease the mining operations. Since the database size is the most influential factor for mining algorithms we examine the use of sampling over static databases to get approximate mining results with an upper bound on the error rate. Moreover, we extend these sampling analysis and present an algorithm based on reservoir sampling to cope with sequential pattern mining over data streams. We demonstrate with empirical results that our sampling methods are efficient and that sequence mining remains accurate over static databases and data streams.
Type de document :
Communication dans un congrès
ICDM'07: International Conference on Data Mining, 2007, P., France. IEEE, pp.6, 2007
Liste complète des métadonnées

Littérature citée [13 références]  Voir  Masquer  Télécharger

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00204524
Contributeur : Chedy Raïssi <>
Soumis le : lundi 14 janvier 2008 - 16:40:08
Dernière modification le : jeudi 24 mai 2018 - 15:59:22
Document(s) archivé(s) le : mardi 13 avril 2010 - 17:34:45

Identifiants

  • HAL Id : lirmm-00204524, version 1

Collections

Citation

Chedy Raïssi, Pascal Poncelet. Sampling For Sequential Pattern Mining: From Static Databases to Data Streams. ICDM'07: International Conference on Data Mining, 2007, P., France. IEEE, pp.6, 2007. 〈lirmm-00204524〉

Partager

Métriques

Consultations de la notice

204

Téléchargements de fichiers

130