Spark Scalability Analysis in a Scientific Workflow - Archive ouverte HAL Access content directly
Conference Papers Year : 2017

Spark Scalability Analysis in a Scientific Workflow

(1) , (1) , (1) , (1) , (2, 3) , (1)
1
2
3

Abstract

Spark is being successfully used for big data parallel processing in many business domains (social media, finance, retail). Spark's scalability, usability, and large user community have motivated developers from scientific domains (bioinformatics, oil and gas, astronomy) to try it. However, scientific applications' profile, e.g., black-box programs and intense file writes, differs from traditional business workflows, which may affect its scalability. We present a scalability analysis of Spark in a real case-study in Oil and Gas domain. We explore workloads on a 936-cores HPC cluster processing 330 GB of scientific data. We show that it scales very well when running long-lasting scientific tasks, but its performance is lower for short-duration tasks.
Fichier principal
Vignette du fichier
SBBD17-spark-clean.pdf (579.86 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

lirmm-01620161 , version 1 (20-10-2017)

Identifiers

  • HAL Id : lirmm-01620161 , version 1

Cite

Renan Souza, Vitor Silva, Pedro Miranda, Alexandre Lima, Patrick Valduriez, et al.. Spark Scalability Analysis in a Scientific Workflow. SBBD: Simpósio Brasileiro de Banco de Dados, Oct 2017, Uberlandia, Brazil. pp.1-6. ⟨lirmm-01620161⟩
462 View
553 Download

Share

Gmail Facebook Twitter LinkedIn More