Optimizing the Data-Process Relationship for Fast Mining of Frequent Itemsets in MapReduce

Saber Salah 1 Reza Akbarinia 1 Florent Masseglia 1
1 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : Despite crucial recent advances, the problem of frequent itemset mining is still facing major challenges. This is particularly the case when: i) the mining process must be massively distributed and; ii) the minimum support (MinSup) is very low. In this paper, we study the effectiveness and leverage of specific data placement strategies for improving parallel frequent itemset mining (PFIM) performance in MapReduce, a highly distributed computation framework. By offering a clever data placement and an optimal organization of the extraction algorithms , we show that the itemset discovery effectiveness does not only depend on the deployed algorithms. We propose ODPR (Optimal Data-Process Relationship), a solution for fast mining of frequent itemsets in MapReduce. Our method allows discovering itemsets from massive datasets, where standard solutions from the literature do not scale. Indeed, in a massively distributed environment, the arrangement of both the data and the different processes can make the global job either completely inoperative or very effective. Our proposal has been evaluated using real-world data sets and the results illustrate a significant scale-up obtained with very low MinSup, which confirms the effectiveness of our approach.
Type de document :
Communication dans un congrès
MLDM'2015: International Conference on Machine Learning and Data Mining, Jul 2015, Hamburg, Germany. Machine Learning and Data Mining in Pattern Recognition, 9166, pp.217-231, 2015, LNCS. 〈10.1007/978-3-319-21024-7_15〉
Liste complète des métadonnées

Littérature citée [17 références]  Voir  Masquer  Télécharger

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01171555
Contributeur : Florent Masseglia <>
Soumis le : dimanche 5 juillet 2015 - 11:47:26
Dernière modification le : samedi 27 janvier 2018 - 01:32:12
Document(s) archivé(s) le : mercredi 26 avril 2017 - 00:21:46

Fichier

mldm.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Saber Salah, Reza Akbarinia, Florent Masseglia. Optimizing the Data-Process Relationship for Fast Mining of Frequent Itemsets in MapReduce. MLDM'2015: International Conference on Machine Learning and Data Mining, Jul 2015, Hamburg, Germany. Machine Learning and Data Mining in Pattern Recognition, 9166, pp.217-231, 2015, LNCS. 〈10.1007/978-3-319-21024-7_15〉. 〈lirmm-01171555〉

Partager

Métriques

Consultations de la notice

1006

Téléchargements de fichiers

418