Algebraic Dataflows for Big Data Analysis

Abstract : Analyzing big data requires the support of dataflows with many activities to extract and explore relevant information from the data. Recent approaches such as Pig Latin propose a high-level language to model such dataflows. However, the dataflow execution is typically delegated to a MapReduce implementation such as Hadoop, which does not follow an algebraic approach, thus it cannot take advantage of the optimization opportunities of PigLatin algebra. In this paper, we propose an approach for big data analysis based on algebraic workflows, which yields optimization and parallel execution of activities and supports user steering using provenance queries. We illustrate how a big data processing dataflow can be modeled using the algebra. Through an experimental evaluation using real datasets and the execution of the dataflow with Chiron, an engine that supports our algebra, we show that our approach yields performance gains of up to 19.6% using algebraic optimizations in the dataflow and up to 39.1% of time saved on a user steering scenario.
Type de document :
Communication dans un congrès
BigData'2013: International Conference on Big Data, Oct 2013, Santa Clara, United States. IEEE, pp.6, 2013, 〈http://www.ischool.drexel.edu/bigdata/bigdata2013〉
Liste complète des métadonnées

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00857221
Contributeur : Patrick Valduriez <>
Soumis le : lundi 9 septembre 2013 - 17:53:12
Dernière modification le : mercredi 10 octobre 2018 - 14:28:12

Identifiants

  • HAL Id : lirmm-00857221, version 1

Collections

Citation

Dias Jonas, Eduardo Ogasawara, Oliveira Daniel De, Fabio Porto, Patrick Valduriez, et al.. Algebraic Dataflows for Big Data Analysis. BigData'2013: International Conference on Big Data, Oct 2013, Santa Clara, United States. IEEE, pp.6, 2013, 〈http://www.ischool.drexel.edu/bigdata/bigdata2013〉. 〈lirmm-00857221〉

Partager

Métriques

Consultations de la notice

884