Efficient Scheduling of Scientific Workflows using Hot Metadata in a Multisite Cloud

Ji Liu 1, 2 Luis Pineda 3 Esther Pacitti 1, 2 Alexandru Costan 3 Patrick Valduriez 1, 2 Gabriel Antoniu 3 Marta Mattoso 4
1 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
3 KerData - Scalable Storage for Clouds and Beyond
Inria Rennes – Bretagne Atlantique , IRISA_D1 - SYSTÈMES LARGE ÉCHELLE
Abstract : Large-scale, data-intensive scientific applications are often expressed as scientific workflows (SWfs). In this paper, we consider the problem of efficient scheduling of a large SWf in a multisite cloud, i.e. a cloud with geo-distributed cloud data centers (sites). The reasons for using multiple cloud sites to run a SWf are that data is already distributed , the necessary resources exceed the limits at a single site, or the monetary cost is lower. In a multisite cloud, metadata management has a critical impact on the efficiency of SWf scheduling as it provides a global view of data location and enables task tracking during execution. Thus, it should be readily available to the system at any given time. While it has been shown that efficient metadata handling plays a key role in performance, little research has targeted this issue in multisite cloud. In this paper, we propose to identify and exploit hot metadata (frequently accessed metadata) for efficient SWf scheduling in a multisite cloud, using a distributed approach. We implemented our approach within a scientific workflow management system, which shows that our approach reduces the execution time of highly parallel jobs up to 64% and that of the whole SWfs up to 55%.
Type de document :
Article dans une revue
IEEE Transactions on Knowledge and Data Engineering, Institute of Electrical and Electronics Engineers, In press, pp.1-20. 〈10.1109/TKDE.2018.2867857〉
Liste complète des métadonnées

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01867717
Contributeur : Patrick Valduriez <>
Soumis le : mardi 4 septembre 2018 - 15:25:36
Dernière modification le : vendredi 15 mars 2019 - 01:15:13
Document(s) archivé(s) le : mercredi 5 décembre 2018 - 17:19:30

Fichier

Ji TKDE author version.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Ji Liu, Luis Pineda, Esther Pacitti, Alexandru Costan, Patrick Valduriez, et al.. Efficient Scheduling of Scientific Workflows using Hot Metadata in a Multisite Cloud. IEEE Transactions on Knowledge and Data Engineering, Institute of Electrical and Electronics Engineers, In press, pp.1-20. 〈10.1109/TKDE.2018.2867857〉. 〈lirmm-01867717〉

Partager

Métriques

Consultations de la notice

405

Téléchargements de fichiers

95