Online Input Data Reduction in Scientific Workflows

Renan Souza 1 Vítor Silva 1 Alvaro Coutinho 1 Patrick Valduriez 2, 3 Marta Mattoso 1
3 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : Many scientific workflows are data-intensive and need be iteratively executed for large input sets of data elements. Reducing input data is a powerful way to reduce overall execution time in such workflows. When this is accomplished online (i.e., without requiring users to stop execution to reduce the data and resume execution), it can save much time and user interactions can integrate within workflow execution. Then, a major problem is to determine which subset of the input data should be removed. Other related problems include guaranteeing that the workflow system will maintain execution and data consistent after reduction, and keeping track of how users interacted with execution. In this paper, we adopt the approach " human-in-the-loop " for scientific workflows by enabling users to steer the workflow execution and reduce input elements from datasets at runtime. We propose an adaptive monitoring approach that combines workflow provenance monitoring and computational steering to support users in analyzing the evolution of key parameters and determining which subset of the data should be removed. We also extend a provenance data model to keep track of user interactions when users reduce data at runtime. In our experimental validation, we develop a test case from the oil and gas industry, using a 936-cores cluster. The results on our parameter sweep test case show that the user interactions for online data reduction yield a 37% reduction of execution time.
Type de document :
Communication dans un congrès
ACM SIGHPC; IEEE. WORKS: Workflows in Support of Large-scale Science, Nov 2016, Salt Lake City, United States. 11th Workshop on Workflows in Support of Large-scale Science, in conjunction with SC2016, 2016, 〈http://works.cs.cardiff.ac.uk〉
Liste complète des métadonnées

Littérature citée [22 références]  Voir  Masquer  Télécharger

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01400538
Contributeur : Patrick Valduriez <>
Soumis le : mardi 22 novembre 2016 - 10:23:29
Dernière modification le : mardi 21 novembre 2017 - 01:22:43
Document(s) archivé(s) le : mardi 21 mars 2017 - 04:37:20

Fichier

WORKS 2016.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : lirmm-01400538, version 1

Collections

Citation

Renan Souza, Vítor Silva, Alvaro Coutinho, Patrick Valduriez, Marta Mattoso. Online Input Data Reduction in Scientific Workflows. ACM SIGHPC; IEEE. WORKS: Workflows in Support of Large-scale Science, Nov 2016, Salt Lake City, United States. 11th Workshop on Workflows in Support of Large-scale Science, in conjunction with SC2016, 2016, 〈http://works.cs.cardiff.ac.uk〉. 〈lirmm-01400538〉

Partager

Métriques

Consultations de la notice

275

Téléchargements de fichiers

150