Isolating rare events in large-scale applications using a backward approach

Enikö Székely 1 Pascal Poncelet 1 Florent Masseglia 2 Maguelonne Teisseire 3, 1 Renaud Cezar 4
1 ADVANSE - ADVanced Analytics for data SciencE
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
2 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : While significant work in data mining has been dedicated to the detection of single outliers in the data, less research has approached the problem of isolating a group of outliers, i.e. rare events representing micro-clusters of less - or significantly less - than 1% of the whole dataset. This research issue is critical for example in medical applications. The problem is difficult to handle as it lies at the frontier between outlier detection and clustering and distinguishes by a clear challenge to avoid missing true positives. We address this challenge and propose a novel two-stage framework, based on a backward approach, to isolate abnormal groups of events in large datasets. The key of our backward approach is to first detect the core of the dense regions and then gradually augment them based on a density-driven condition. The framework outputs a small subset of the dataset that contains both outliers and rare events. Experiments are performed on both synthetic data and a medical application and compared against state-of-the-art outlier detection algorithms. The results show a very good performance of our approach and confirm the fact that dedicated algorithms are needed for the detection of rare events in large-scale applications.
Type de document :
Rapport
[Research Report] RR-13003, LIRMM. 2013
Liste complète des métadonnées

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00798074
Contributeur : Pascal Poncelet <>
Soumis le : vendredi 8 mars 2013 - 02:29:50
Dernière modification le : mercredi 21 novembre 2018 - 19:48:06

Identifiants

  • HAL Id : lirmm-00798074, version 1

Citation

Enikö Székely, Pascal Poncelet, Florent Masseglia, Maguelonne Teisseire, Renaud Cezar. Isolating rare events in large-scale applications using a backward approach. [Research Report] RR-13003, LIRMM. 2013. 〈lirmm-00798074〉

Partager

Métriques

Consultations de la notice

566