SPAMS : A Novel Incremental Approach for Sequential Pattern Mining in Data Streams

Lionel Venceslas 1 Jean-Émile Symphor 1 Alban Mancheron 2 Pascal Poncelet 3
2 MAB - Méthodes et Algorithmes pour la Bioinformatique
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
3 TATOO - Fouille de données environnementales
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : Mining sequential patterns in data streams is a new challenging problem for the datamining community since data arrives sequentially in the form of continuous rapid and infinite streams. In this paper, we propose a new on-line algorithm, SPAMS, to deal with the sequential patterns mining problem in data streams. This algorithm uses an automaton-based structure to maintain the set of frequent sequential patterns, i.e. SPA (Sequential Pat- tern Automaton). The sequential pattern automaton can be smaller than the set of frequent sequential patterns by two or more orders of magnitude, which allows us to overcome the problem of combinatorial explosion of se- quential patterns. Current results can be output constantly on any user 's specified thresholds. In addition, taking into account the characteristics of data streams, we propose a well-suited method said to be approximate since we can provide near optimal results with a high probability. Experimental studies show the relevance of the SPA data structure and the efficiency of the SPAMS algorithm on various datasets. Our contribution opens a promis- ing gateway, by using an automaton as a data structure for mining frequent sequential patterns in data streams.
Type de document :
Chapitre d'ouvrage
Springer Verlag. Advances in Knowledge Discovery and Management, pp.201-216, 2009
Liste complète des métadonnées

Littérature citée [14 références]  Voir  Masquer  Télécharger

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00435841
Contributeur : Pascal Poncelet <>
Soumis le : mardi 24 novembre 2009 - 21:54:00
Dernière modification le : mercredi 18 juillet 2018 - 20:11:26
Document(s) archivé(s) le : jeudi 17 juin 2010 - 21:56:53

Fichier

akdmSPAMS.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

  • HAL Id : lirmm-00435841, version 1

Collections

Citation

Lionel Venceslas, Jean-Émile Symphor, Alban Mancheron, Pascal Poncelet. SPAMS : A Novel Incremental Approach for Sequential Pattern Mining in Data Streams. Springer Verlag. Advances in Knowledge Discovery and Management, pp.201-216, 2009. 〈lirmm-00435841〉

Partager

Métriques

Consultations de la notice

318

Téléchargements de fichiers

247