Detection of Sequential Outliers using a Variable Length Markov Model

Cécile Low-Kam 1 Anne Laurent 2 Maguelonne Teisseire 2
2 TATOO - Fouille de données environnementales
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : The problem of mining for outliers in sequential datasets is crucial to forward appropriate analysis of data. Therefore, many approaches for the discovery of such anomalies have been proposed. However, most of them use a sample of known typical sequences to build the model. Besides, they remain greedy in terms of memory usage. In this paper we propose an extension of one such approach, based on a Probabilistic Suffix Tree and on a measure of similarity. We add a pruning criterion which reduces the size of the tree while improving the model, and a sharp inequality for the concentration of the measure of similarity, to better sort the outliers. We prove the feasability of our approach through a set of experiments over a protein database.
Type de document :
Communication dans un congrès
ICMLA'08: International Conference on Machine Learning and Applications, Dec 2008, IEEE, pp.001-006, 2008, 〈http://www.icmla-conference.org/icmla08/〉
Liste complète des métadonnées

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00324526
Contributeur : Cécile Low-Kam <>
Soumis le : jeudi 25 septembre 2008 - 11:51:10
Dernière modification le : jeudi 11 janvier 2018 - 06:26:17

Identifiants

  • HAL Id : lirmm-00324526, version 1

Citation

Cécile Low-Kam, Anne Laurent, Maguelonne Teisseire. Detection of Sequential Outliers using a Variable Length Markov Model. ICMLA'08: International Conference on Machine Learning and Applications, Dec 2008, IEEE, pp.001-006, 2008, 〈http://www.icmla-conference.org/icmla08/〉. 〈lirmm-00324526〉

Partager

Métriques

Consultations de la notice

84