Data Partitioning for Minimizing Transferred Data in MapReduce

Abstract : Reducing data transfer in MapReduce's shuffle phase is very important because it increases data locality of reduce tasks, and thus decreases the overhead of job executions. In the literature, several optimizations have been proposed to reduce data transfer between mappers and reducers. Nevertheless, all these approaches are limited by how intermediate key-value pairs are distributed over map outputs. In this paper, we address the problem of high data transfers in MapReduce, and propose a technique that repartitions tuples of the input datasets, and thereby optimizes the distribution of key-values over mappers, and increases the data locality in reduce tasks. Our approach captures the relationships between input tuples and intermediate keys by monitoring the execution of a set of MapReduce jobs which are representative of the workload. Then, based on those relationships, it assigns input tuples to the appropriate chunks. We evaluated our approach through experimentation in a Hadoop deployment on top of Grid5000 using standard benchmarks. The results show high reduction in data transfer during the shuffle phase compared to Native Hadoop.
Type de document :
Communication dans un congrès
Hameurlain, Abdelkader and Rahayu, Wenny and Taniar, David. Globe'2013: 6th International Conference on Data Management in Cloud, Grid and P2P Systems, Aug 2013, Prague, Czech Republic. Springer, pp.1-12, 2013, LNCS. 〈http://www.irit.fr/globe13/〉. 〈10.1007/978-3-642-40053-7_1〉
Liste complète des métadonnées

Littérature citée [10 références]  Voir  Masquer  Télécharger

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00879527
Contributeur : Miguel Liroz-Gistau <>
Soumis le : lundi 4 novembre 2013 - 10:48:46
Dernière modification le : vendredi 12 janvier 2018 - 01:54:57
Document(s) archivé(s) le : vendredi 7 avril 2017 - 20:16:22

Fichier

globe_2013-paper.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Miguel Liroz-Gistau, Reza Akbarinia, Divyakant Agrawal, Esther Pacitti, Patrick Valduriez. Data Partitioning for Minimizing Transferred Data in MapReduce. Hameurlain, Abdelkader and Rahayu, Wenny and Taniar, David. Globe'2013: 6th International Conference on Data Management in Cloud, Grid and P2P Systems, Aug 2013, Prague, Czech Republic. Springer, pp.1-12, 2013, LNCS. 〈http://www.irit.fr/globe13/〉. 〈10.1007/978-3-642-40053-7_1〉. 〈lirmm-00879527〉

Partager

Métriques

Consultations de la notice

483

Téléchargements de fichiers

930