Skip to Main content Skip to Navigation
Conference papers

Spark Scalability Analysis in a Scientific Workflow

Renan Souza 1 Vitor Silva 1 Pedro Miranda 1 Alexandre Lima 1 Patrick Valduriez 2, 3 Marta Mattoso 1
2 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : Spark is being successfully used for big data parallel processing in many business domains (social media, finance, retail). Spark's scalability, usability, and large user community have motivated developers from scientific domains (bioinformatics, oil and gas, astronomy) to try it. However, scientific applications' profile, e.g., black-box programs and intense file writes, differs from traditional business workflows, which may affect its scalability. We present a scalability analysis of Spark in a real case-study in Oil and Gas domain. We explore workloads on a 936-cores HPC cluster processing 330 GB of scientific data. We show that it scales very well when running long-lasting scientific tasks, but its performance is lower for short-duration tasks.
Complete list of metadatas

Cited literature [13 references]  Display  Hide  Download

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01620161
Contributor : Patrick Valduriez <>
Submitted on : Friday, October 20, 2017 - 11:22:04 AM
Last modification on : Monday, October 19, 2020 - 2:34:03 PM
Long-term archiving on: : Sunday, January 21, 2018 - 1:46:55 PM

File

SBBD17-spark-clean.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : lirmm-01620161, version 1

Collections

Citation

Renan Souza, Vitor Silva, Pedro Miranda, Alexandre Lima, Patrick Valduriez, et al.. Spark Scalability Analysis in a Scientific Workflow. SBBD: Simpósio Brasileiro de Banco de Dados, Oct 2017, Uberlandia, Brazil. pp.1-6. ⟨lirmm-01620161⟩

Share

Metrics

Record views

546

Files downloads

506