Spark Scalability Analysis in a Scientific Workflow

Abstract : Spark is being successfully used for big data parallel processing in many business domains (social media, finance, retail). Spark's scalability, usability, and large user community have motivated developers from scientific domains (bioinformatics, oil and gas, astronomy) to try it. However, scientific applications' profile, e.g., black-box programs and intense file writes, differs from traditional business workflows, which may affect its scalability. We present a scalability analysis of Spark in a real case-study in Oil and Gas domain. We explore workloads on a 936-cores HPC cluster processing 330 GB of scientific data. We show that it scales very well when running long-lasting scientific tasks, but its performance is lower for short-duration tasks.
Type de document :
Communication dans un congrès
SBBD: Simpósio Brasileiro de Banco de Dados, Oct 2017, Uberlandia, Brazil. 32th Brazilian Symposium on Databases, pp.1-6, 2017
Liste complète des métadonnées

Littérature citée [13 références]  Voir  Masquer  Télécharger

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01620161
Contributeur : Patrick Valduriez <>
Soumis le : vendredi 20 octobre 2017 - 11:22:04
Dernière modification le : mercredi 21 novembre 2018 - 20:28:39
Document(s) archivé(s) le : dimanche 21 janvier 2018 - 13:46:55

Fichier

SBBD17-spark-clean.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : lirmm-01620161, version 1

Citation

Renan Souza, Vitor Silva, Pedro Miranda, Alexandre Lima, Patrick Valduriez, et al.. Spark Scalability Analysis in a Scientific Workflow. SBBD: Simpósio Brasileiro de Banco de Dados, Oct 2017, Uberlandia, Brazil. 32th Brazilian Symposium on Databases, pp.1-6, 2017. 〈lirmm-01620161〉

Partager

Métriques

Consultations de la notice

319

Téléchargements de fichiers

223