S2MP: Similarity Measure for Sequential Patterns

In data mining, computing the similarity of objects is an essential task, for example to identify regularities or to build homogeneous clusters of objects. In the case of sequential data seen in various fields of application (e.g. series of customers purchases, Internet navigation) this problem (i.e. comparing the similarity of sequences) is very important. There are already some similarity measures as Edit distance and LCS suited to simple sequences, but these measures are not relevant in the case of complex sequences composed of sets of items, as is the case of sequential patterns. In this paper, we propose a new similarity measure taking the characteristics of sequential patterns into account. S2MP is an adjustable measure depending on the importance given to each characteristic of sequential patterns according to context, which is not the case of existing measures. We have experimented the accuracy and quality of S2MP against Edit distance by using them in a clustering of sequential patterns. The results show that the clusters obtained by S2MP are more homogeneous. Moreover these cluster are more precise and more complete according to the clusters obtained using Edit distance. The experiments show also that S2MP is efficient in term of calculation time and size of used memory.

Mots clés

Data Mining Sequential Patterns Similarity Measure Clustering Clustering of Sequential Patterns S2MP

Domaines

Base de données [cs.DB]

Fichier principal

CRPITV87Saneifar.pdf (317.82 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Hassan Saneifar : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00324488

Soumis le : jeudi 25 juin 2009-15:44:36

Dernière modification le : mardi 12 mars 2024-10:44:55

Archivage à long terme le : lundi 8 octobre 2012-13:30:30

Dates et versions

lirmm-00324488 , version 1 (25-06-2009)

Identifiants

HAL Id : lirmm-00324488 , version 1

Citer

Hassan Saneifar, Sandra Bringay, Anne Laurent, Maguelonne Teisseire. S2MP: Similarity Measure for Sequential Patterns. AusDM: Australasian Data Mining, Nov 2008, Adelaide, Australia. pp.095-104. ⟨lirmm-00324488⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS UNIV-MONTP3 LIRMM MIPS UNIV-MONTPELLIER

490 Consultations

726 Téléchargements