Efficient Scheduling of Scientific Workflows using Hot Metadata in a Multisite Cloud

Ji Liu 1 Luis Pineda-Morales 2 Esther Pacitti 1 Alexandru Costan 2 Patrick Valduriez 1 Gabriel Antoniu 2 Marta Mattoso 3
1 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
2 KerData - Scalable Storage for Clouds and Beyond
Inria Rennes – Bretagne Atlantique , IRISA_D1 - SYSTÈMES LARGE ÉCHELLE
Abstract : Large-scale scientific applications are often expressed as scientific workflows (SWfs) that help defining data processing jobs and dependencies between jobs’ activities. Several SWfs have huge storage and computation requirements, and so they need to be processed in multiple (cloud-federated) datacenters. It has been shown that efficient metadata handling plays a key role in the performance of computing systems. However, most of this evidence concern only single-site, HPC systems to date. In addition, the efficient scheduling of tasks among different data centers is critical to the SWf execution. In this paper, we present a hybrid distributed model and architecture, using hot metadata (frequently accessed metadata) for efficient SWf scheduling in a multisite cloud. We couple our model with a scientific workflow management system (SWfMS) to validate and tune its applicability to different real-life scientific workflows with different scheduling algorithms. We show that the combination of efficient management of hot metadata and scheduling algorithms improves the performance of SWfMS, reducing the execution time of highly parallel jobs up to 64.1% and that of the whole scientific workflows up to 37.5%, by avoiding unnecessary cold metadata operations.
Complete list of metadatas

Cited literature [36 references]  Display  Hide  Download

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01620231
Contributor : Ji Liu <>
Submitted on : Tuesday, November 21, 2017 - 1:14:27 PM
Last modification on : Tuesday, February 19, 2019 - 10:38:45 AM

File

BDA2017.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : lirmm-01620231, version 2

Citation

Ji Liu, Luis Pineda-Morales, Esther Pacitti, Alexandru Costan, Patrick Valduriez, et al.. Efficient Scheduling of Scientific Workflows using Hot Metadata in a Multisite Cloud. BDA: Gestion de Données — Principes, Technologies et Applications, Nov 2017, Nancy, France. pp.13. ⟨lirmm-01620231v2⟩

Share

Metrics

Record views

256

Files downloads

191