A Density-Based Backward Approach to Isolate Rare Events in Large-Scale Applications

Enikö Székely 1 Pascal Poncelet 1 Florent Masseglia 2 Maguelonne Teisseire 1 Renaud Cezar 3
1 ADVANSE - ADVanced Analytics for data SciencE
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
2 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : While significant work in data mining has been dedicated to the detection of single outliers in the data, less research has approached the problem of isolating a group of outliers, i.e. rare events representing micro-clusters of less - or significantly less - than 1% of the whole dataset. This research issue is critical for example in medical applications. The problem is difficult to handle as it lies at the frontier between outlier detection and clustering and distinguishes by a clear challenge to avoid missing true positives. We address this challenge and propose a novel two-stage framework, based on a backward approach, to isolate abnormal groups of events in large datasets. The key of our backward approach is to first identify the core of the dense regions and then gradually augments them based on a density-driven condition. The framework outputs a small subset of the dataset containing both rare events and outliers. We tested our framework on a biomedical application to find micro-clusters of pathological cells. The comparison against two common clustering (DBSCAN) and outlier detection (LOF) algorithms show that our approach is a very efficient alternative to the detection of rare events - generally a recall of 100% and a higher precision, positively correlated wih the size of the rare event - while also providing a O(N) solution to the existing algorithms dominated by a O(N2) complexity.
Type de document :
Communication dans un congrès
Johannes Fürnkranz and Eyke Hüllermeier and Tomoyuki Higuchi. DS: Discovery Science, Oct 2013, Singapore, Singapore. Springer, pp.249-264, 2013, Lecture Notes in Computer Science. 〈http://www.mathematik.uni-marburg.de/~discoveryscience2013/〉. 〈10.1007/978-3-642-40897-7_17〉
Liste complète des métadonnées

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00907893
Contributeur : Florent Masseglia <>
Soumis le : vendredi 22 novembre 2013 - 07:47:57
Dernière modification le : jeudi 24 mai 2018 - 15:59:25

Lien texte intégral

Identifiants

Collections

Citation

Enikö Székely, Pascal Poncelet, Florent Masseglia, Maguelonne Teisseire, Renaud Cezar. A Density-Based Backward Approach to Isolate Rare Events in Large-Scale Applications. Johannes Fürnkranz and Eyke Hüllermeier and Tomoyuki Higuchi. DS: Discovery Science, Oct 2013, Singapore, Singapore. Springer, pp.249-264, 2013, Lecture Notes in Computer Science. 〈http://www.mathematik.uni-marburg.de/~discoveryscience2013/〉. 〈10.1007/978-3-642-40897-7_17〉. 〈lirmm-00907893〉

Partager

Métriques

Consultations de la notice

469