Massively Distributed Environments and Closed Itemset Mining: The DCIM Approach

Mehdi Zitouni 1, 2 Reza Akbarinia 1 Sadok Ben Yahia 2 Florent Masseglia 1
1 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : Data analytics in general, and data mining primitives in particular , are a major source of bottlenecks in the operation of information systems. This is mainly due to their high complexity and intensive call to IO operations, particularly in massively distributed environments. Moreover , an important application of data analytics is to discover key insights from the running traces of information system in order to improve their engineering. Mining closed frequent itemsets (CFI) is one of these data mining techniques, associated with great challenges. It allows discovering itemsets with better efficiency and result compactness. However, discovering such itemsets in massively distributed data poses a number of issues that are not addressed by traditional methods. One solution for dealing with such characteristics is to take advantage of parallel frameworks like, e.g., MapReduce. We address the problem of distributed CFI mining by introducing a new parallel algorithm, called DCIM, which uses a prime number based approach. A key feature of DCIM is the deep combination of data mining properties with the principles of massive data distribution. We carried out exhaustive experiments over real world datasets to illustrate the efficiency of DCIM for large real world datasets with up to 53 million documents.
Type de document :
Communication dans un congrès
CAiSE: Advanced Information Systems Engineering, Jun 2017, Essen, Germany. 29th International Conference on Advanced Information Systems Engineering, LNCS (10253), pp.231-246, 2017, 〈http://caise2017.paluno.de/welcome/〉. 〈10.1007/978-3-319-59536-8_15〉
Liste complète des métadonnées

Littérature citée [19 références]  Voir  Masquer  Télécharger

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01620238
Contributeur : Reza Akbarinia <>
Soumis le : vendredi 20 octobre 2017 - 12:02:39
Dernière modification le : mardi 6 février 2018 - 21:58:54
Document(s) archivé(s) le : dimanche 21 janvier 2018 - 14:40:37

Fichier

DCIM__BDA_.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Mehdi Zitouni, Reza Akbarinia, Sadok Ben Yahia, Florent Masseglia. Massively Distributed Environments and Closed Itemset Mining: The DCIM Approach. CAiSE: Advanced Information Systems Engineering, Jun 2017, Essen, Germany. 29th International Conference on Advanced Information Systems Engineering, LNCS (10253), pp.231-246, 2017, 〈http://caise2017.paluno.de/welcome/〉. 〈10.1007/978-3-319-59536-8_15〉. 〈lirmm-01620238〉

Partager

Métriques

Consultations de la notice

59

Téléchargements de fichiers

82