Detection of Sequential Outliers using a Variable Length Markov Model

Cécile Low-Kam; Anne Laurent; Maguelonne Teisseire

doi:10.1109/ICMLA.2008.137

Communication Dans Un Congrès Année : 2008

Detection of Sequential Outliers using a Variable Length Markov Model

(1) , (2) , (2)

1
2

Cécile Low-Kam

Fonction : Auteur
PersonId : 853976

Institut de Mathématiques et de Modélisation de Montpellier

Anne Laurent

Fonction : Auteur
PersonId : 21743
IdHAL : anne-laurent
ORCID : 0000-0003-3708-6429
IdRef : 075173735

Fouille de données environnementales

Maguelonne Teisseire

Fonction : Auteur
PersonId : 8645
IdHAL : maguelonne-teisseire
ORCID : 0000-0001-9313-6414
IdRef : 117436593

Fouille de données environnementales

Résumé

The problem of mining for outliers in sequential datasets is crucial to forward appropriate analysis of data. Therefore, many approaches for the discovery of such anomalies have been proposed. However, most of them use a sample of known typical sequences to build the model. Besides, they remain greedy in terms of memory usage. In this paper we propose an extension of one such approach, based on a Probabilistic Suffix Tree and on a measure of similarity. We add a pruning criterion which reduces the size of the tree while improving the model, and a sharp inequality for the concentration of the measure of similarity, to better sort the outliers. We prove the feasability of our approach through a set of experiments over a protein database.

Mots clés

Outliers sequential databases suffix tree information theory information criterion concentration inequality

Domaines

Statistiques [math.ST] Base de données [cs.DB]

Fichier principal

lirmm-00324526v1.pdf (440.7 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Cécile Low-Kam : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00324526

Soumis le : lundi 7 octobre 2019-16:20:31

Dernière modification le : jeudi 14 mars 2024-03:12:57

Dates et versions

lirmm-00324526 , version 1 (07-10-2019)

Identifiants

HAL Id : lirmm-00324526 , version 1
DOI : 10.1109/ICMLA.2008.137

Citer

Cécile Low-Kam, Anne Laurent, Maguelonne Teisseire. Detection of Sequential Outliers using a Variable Length Markov Model. ICMLA: International Conference on Machine Learning and Applications, Dec 2008, San Diego, CA, United States. pp.571-576, ⟨10.1109/ICMLA.2008.137⟩. ⟨lirmm-00324526⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS I3M_UMR5149 LIRMM IMAG-MONTPELLIER MIPS UNIV-MONTPELLIER

109 Consultations

121 Téléchargements

Detection of Sequential Outliers using a Variable Length Markov Model

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager