Efficient Scheduling of Scientific Workflows using Hot Metadata in a Multisite Cloud

Ji Liu 1 Luis Pineda-Morales 2 Esther Pacitti 1 Alexandru Costan 2 Patrick Valduriez 1 Gabriel Antoniu 2 Marta Mattoso 3
1 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
2 KerData - Scalable Storage for Clouds and Beyond
Inria Rennes – Bretagne Atlantique , IRISA_D1 - SYSTÈMES LARGE ÉCHELLE
Abstract : Large-scale scientific applications are often expressed as scientific workflows (SWfs) that help defining data processing jobs and dependencies between jobs’ activities. Several SWfs have huge storage and computation requirements, and so they need to be processed in multiple (cloud-federated) datacenters. It has been shown that efficient metadata handling plays a key role in the performance of computing systems. However, most of this evidence concern only single-site, HPC systems to date. In addition, the efficient scheduling of tasks among different data centers is critical to the SWf execution. In this paper, we present a hybrid distributed model and architecture, using hot metadata (frequently accessed metadata) for efficient SWf scheduling in a multisite cloud. We couple our model with a scientific workflow management system (SWfMS) to validate and tune its applicability to different real-life scientific workflows with different scheduling algorithms. We show that the combination of efficient management of hot metadata and scheduling algorithms improves the performance of SWfMS, reducing the execution time of highly parallel jobs up to 64.1% and that of the whole scientific workflows up to 37.5%, by avoiding unnecessary cold metadata operations.
Complete list of metadatas

Cited literature [36 references]  Display  Hide  Download

Contributor : Ji Liu <>
Submitted on : Tuesday, November 21, 2017 - 1:14:27 PM
Last modification on : Tuesday, February 19, 2019 - 10:38:45 AM


Files produced by the author(s)


  • HAL Id : lirmm-01620231, version 2


Ji Liu, Luis Pineda-Morales, Esther Pacitti, Alexandru Costan, Patrick Valduriez, et al.. Efficient Scheduling of Scientific Workflows using Hot Metadata in a Multisite Cloud. BDA: Gestion de Données — Principes, Technologies et Applications, Nov 2017, Nancy, France. pp.13. ⟨lirmm-01620231v2⟩



Record views


Files downloads