Skip to Main content Skip to Navigation

Supporting User Steering In Large-Scale Workflows With Provenance Data

Renan Souza 1, 2, 3
3 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : Computational Science and Engineering (CSE) workflows are large-scale, require High Performance Computing (HPC) execution, and have the exploratory nature of science. During the long run, which often lasts for hours or days, users need to steer the workflow by dynamically analyzing it and adapting it to improve the quality of results or to reduce the execution time. However, to steer the workflow, users typically perform several interactions (called user steering actions), which need to be tracked. Otherwise, users find it harder to understand how and what needs to be steered, they can steer in a misleading way, it can be difficult to explain the results that were consequences of their actions, and it can be impossible to reproduce the results. This thesis addresses this problem by proposing an approach that defines the fundamental concepts for user steering action; introduces the notion of provenance of steering actions; and contemplates a W3C PROV-compliant data diagram to model steering action data with provenance. Also, the approach presents system design principles to enable the management of steering action data by capturing, explicitly relating the actions to the rest of the workflow data, and storing these data efficiently. Two instances of this approach were designed and built: one is a lightweight tool to be plugged into parallel scripts and the other is to be used within a Parallel Workflow Management System, which are the two typical ways to conduct CSE experiments in HPC. Using real use cases in the Oil and Gas industry, the experiments show that the proposed approach enables users to understand how their actions directly affect the workflow results at runtime and that the system design principles were essential to add negligible overhead to the HPC workflows.
Document type :
Complete list of metadata

Cited literature [135 references]  Display  Hide  Download
Contributor : Patrick Valduriez <>
Submitted on : Friday, January 17, 2020 - 3:42:51 PM
Last modification on : Monday, October 19, 2020 - 2:34:03 PM


Files produced by the author(s)


  • HAL Id : tel-02418022, version 3


Renan Souza. Supporting User Steering In Large-Scale Workflows With Provenance Data. Databases [cs.DB]. UFRJ, Rio de Janeiro, 2019. English. ⟨tel-02418022v3⟩



Record views


Files downloads