Chiron: A Parallel Engine for Algebraic Scientific Workflows
Résumé
Large-scale scientific experiments based on computer simulations are typically modeled as scientific workflows, which eases the chaining of different programs. These scientific workflows are defined, executed and monitored by Scientific Workflow Management Systems (SWfMS). As these experiments manage large amounts of data, it becomes critical to execute them in High Performance Computing (HPC) environments, such as clusters, grids and clouds. However, few SWfMS provide parallel support. The ones that do so are usually labor-intensive for workflow developers and have limited primitives to optimize workflow execution. To address these issues, we developed a workflow algebra to specify and enable the optimization of parallel execution of scientific workflows. In this paper, we show how the workflow algebra is efficiently implemented in Chiron, an algebraic based parallel scientific workflow engine. Chiron has a unique native distributed provenance mechanism that enables run-time queries in a relational database. We developed two studies to evaluate the performance of our algebraic approach implemented in Chiron; the first study compares Chiron with different approaches while the second one evaluates the scalability of Chiron. By analyzing the results, we conclude that Chiron is efficient in executing scientific workflows, with the benefits of declarative specification and runtime provenance support.