Optimizing the Data-Process Relationship for Fast Mining of Frequent Itemsets in MapReduce

Saber Salah; Reza Akbarinia; Florent Masseglia

doi:10.1007/978-3-319-21024-7_15

Communication Dans Un Congrès Année : 2015

Optimizing the Data-Process Relationship for Fast Mining of Frequent Itemsets in MapReduce

(1) , (1) , (1)

Saber Salah

Fonction : Auteur
PersonId : 967928

Scientific Data Management

Reza Akbarinia

Fonction : Auteur
PersonId : 172647
IdHAL : reza-akbarinia
ORCID : 0000-0002-7098-0361
IdRef : 119863421

Scientific Data Management

Florent Masseglia

Fonction : Auteur
PersonId : 172896
IdHAL : florent-masseglia
ORCID : 0000-0002-1149-585X
IdRef : 120528681

Scientific Data Management

Résumé

Despite crucial recent advances, the problem of frequent itemset mining is still facing major challenges. This is particularly the case when: i) the mining process must be massively distributed and; ii) the minimum support (MinSup) is very low. In this paper, we study the effectiveness and leverage of specific data placement strategies for improving parallel frequent itemset mining (PFIM) performance in MapReduce, a highly distributed computation framework. By offering a clever data placement and an optimal organization of the extraction algorithms , we show that the itemset discovery effectiveness does not only depend on the deployed algorithms. We propose ODPR (Optimal Data-Process Relationship), a solution for fast mining of frequent itemsets in MapReduce. Our method allows discovering itemsets from massive datasets, where standard solutions from the literature do not scale. Indeed, in a massively distributed environment, the arrangement of both the data and the different processes can make the global job either completely inoperative or very effective. Our proposal has been evaluated using real-world data sets and the results illustrate a significant scale-up obtained with very low MinSup, which confirms the effectiveness of our approach.

Domaines

Informatique [cs] Base de données [cs.DB] Calcul parallèle, distribué et partagé [cs.DC]

Fichier principal

mldm.pdf (147.42 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Florent Masseglia : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01171555

Soumis le : dimanche 5 juillet 2015-11:47:26

Dernière modification le : jeudi 15 février 2024-03:31:46

Archivage à long terme le : mercredi 26 avril 2017-00:21:46

Dates et versions

lirmm-01171555 , version 1 (05-07-2015)

Identifiants

HAL Id : lirmm-01171555 , version 1
DOI : 10.1007/978-3-319-21024-7_15

Citer

Saber Salah, Reza Akbarinia, Florent Masseglia. Optimizing the Data-Process Relationship for Fast Mining of Frequent Itemsets in MapReduce. MLDM 2015 - 11th International Conference on Machine Learning and Data Mining in Pattern Recognition, Jul 2015, Hamburg, Germany. pp.217-231, ⟨10.1007/978-3-319-21024-7_15⟩. ⟨lirmm-01171555⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA IRISA GRID5000 ZENITH LIRMM INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC MIPS UNIV-MONTPELLIER UNIV-RENNES SILECS UR1-MATH-NUM

600 Consultations

582 Téléchargements

Optimizing the Data-Process Relationship for Fast Mining of Frequent Itemsets in MapReduce

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager