Optimizing the Data-Process Relationship for Fast Mining of Frequent Itemsets in MapReduce

Saber Salah; Reza Akbarinia; Florent Masseglia

doi:10.1007/978-3-319-21024-7_15

Communication Dans Un Congrès Année : 2015

Optimizing the Data-Process Relationship for Fast Mining of Frequent Itemsets in MapReduce

(1) , (1) , (1)

Saber Salah

Fonction : Auteur
PersonId : 967928

ZENITH - Scientific Data Management

Reza Akbarinia

Fonction : Auteur
PersonId : 172647
IdHAL : reza-akbarinia
ORCID : 0000-0002-7098-0361
IdRef : 119863421

ZENITH - Scientific Data Management

Florent Masseglia

Fonction : Auteur
PersonId : 172896
IdHAL : florent-masseglia
ORCID : 0000-0002-1149-585X
IdRef : 120528681

ZENITH - Scientific Data Management

Résumé

Despite crucial recent advances, the problem of frequent itemset mining is still facing major challenges. This is particularly the case when: i) the mining process must be massively distributed and; ii) the minimum support (MinSup) is very low. In this paper, we study the effectiveness and leverage of specific data placement strategies for improving parallel frequent itemset mining (PFIM) performance in MapReduce, a highly distributed computation framework. By offering a clever data placement and an optimal organization of the extraction algorithms , we show that the itemset discovery effectiveness does not only depend on the deployed algorithms. We propose ODPR (Optimal Data-Process Relationship), a solution for fast mining of frequent itemsets in MapReduce. Our method allows discovering itemsets from massive datasets, where standard solutions from the literature do not scale. Indeed, in a massively distributed environment, the arrangement of both the data and the different processes can make the global job either completely inoperative or very effective. Our proposal has been evaluated using real-world data sets and the results illustrate a significant scale-up obtained with very low MinSup, which confirms the effectiveness of our approach.

Domaines

Fichier principal

mldm.pdf (147.42 Ko)

Origine	Fichiers produits par l'(les) auteur(s)
Licence	Autorisation HAL

Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01171555

Soumis le : dimanche 5 juillet 2015-11:47:26

Dernière modification le : mardi 26 août 2025-15:21:01

Archivage à long terme le : mercredi 26 avril 2017-00:21:46

Dates et versions

lirmm-01171555 , version 1 (05-07-2015)

Licence

Autorisation HAL

Identifiants

HAL Id : lirmm-01171555 , version 1
DOI : 10.1007/978-3-319-21024-7_15

Citer

Saber Salah, Reza Akbarinia, Florent Masseglia. Optimizing the Data-Process Relationship for Fast Mining of Frequent Itemsets in MapReduce. MLDM 2015 - 11th International Conference on Machine Learning and Data Mining in Pattern Recognition, Jul 2015, Hamburg, Germany. pp.217-231, ⟨10.1007/978-3-319-21024-7_15⟩. ⟨lirmm-01171555⟩

Optimizing the Data-Process Relationship for Fast Mining of Frequent Itemsets in MapReduce

Résumé

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager