A Survey of Scheduling Frameworks in Big Data Systems

Ji Liu; Esther Pacitti; Patrick Valduriez

doi:10.1504/IJCC.2018.093765

Article Dans Une Revue IJCC - International Journal of Cloud Computing Année : 2018

A Survey of Scheduling Frameworks in Big Data Systems

(1, 2) , (1, 2) , (1, 2)

1
2

Ji Liu

Fonction : Auteur
PersonId : 958433
ORCID : 0000-0003-4710-5697

Scientific Data Management

Institut de Biologie Computationnelle

Esther Pacitti

Fonction : Auteur
PersonId : 3271
IdHAL : esther-pacitti
ORCID : 0000-0003-1370-9943
IdRef : 117946451

Scientific Data Management

Institut de Biologie Computationnelle

Patrick Valduriez

Fonction : Auteur
PersonId : 172604
IdHAL : patrick-valduriez
ORCID : 0000-0001-6506-7538
IdRef : 028314417

Scientific Data Management

Institut de Biologie Computationnelle

Résumé

Cloud and big data technologies are now converging to enable organizations to outsource data in the cloud and get value from data through big data analytics. Big data systems typically exploit computer clusters to gain scalability and obtain a good cost-performance ratio. However, scheduling a workload in a computer cluster remains a well-known open problem. Scheduling methods are typically implemented in a scheduling framework and may have different objectives. In this paper, we survey scheduling methods and frameworks for big data systems, propose a taxonomy and analyze the features of the different categories of scheduling frameworks. These frameworks have been designed initially for the cloud (MapReduce) to process Web data. We examine sixteen popular scheduling frameworks and discuss their features. Our study shows that different frameworks are proposed for different big data systems, different scales of computer clusters and different objectives. We propose the main dimensions for workloads and metrics for benchmarks to evaluate these scheduling frameworks. Finally, we analyze their limitations and propose new research directions.

Mots clés

Cluster computing Scheduling method Cloud computing Scheduling framework Big data Parallel processing

Domaines

Base de données [cs.DB] Calcul parallèle, distribué et partagé [cs.DC]

Fichier principal

HALVersionSurvey.pdf (1013.19 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Patrick Valduriez : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01692229

Soumis le : mercredi 24 janvier 2018-19:10:28

Dernière modification le : jeudi 1 février 2024-10:03:53

Archivage à long terme le : jeudi 24 mai 2018-22:18:38

Dates et versions

lirmm-01692229 , version 1 (24-01-2018)

Identifiants

HAL Id : lirmm-01692229 , version 1
DOI : 10.1504/IJCC.2018.093765

Citer

Ji Liu, Esther Pacitti, Patrick Valduriez. A Survey of Scheduling Frameworks in Big Data Systems. IJCC - International Journal of Cloud Computing, 2018, 7 (2), pp.103-128. ⟨10.1504/IJCC.2018.093765⟩. ⟨lirmm-01692229⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA INRA IRISA ZENITH LIRMM INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC MIPS UNIV-MONTPELLIER UNIV-RENNES INRAE UR1-MATH-NUM

947 Consultations

1553 Téléchargements

A Survey of Scheduling Frameworks in Big Data Systems

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager