Parallel Polyglot Query Processing on Heterogeneous Cloud Data Stores with LeanXcale - Archive ouverte HAL Access content directly
Conference Papers Year : 2019

Parallel Polyglot Query Processing on Heterogeneous Cloud Data Stores with LeanXcale

(1) , (1) , (1) , (1) , (2) , (2) , (2) , (2, 3)
1
2
3
Boyan Kolev
Oleksandra Levchenko
Esther Pacitti
Patrick Valduriez
Ricardo Vilaça
  • Function : Author
  • PersonId : 998186
Rui C. Gonçalves
  • Function : Author
  • PersonId : 998208

Abstract

The blooming of different cloud data stores has turned polystore systems to a major topic in the nowadays cloud landscape. Especially, as the amount of processed data grows rapidly each year, much attention is being paid on taking advantage of the parallel processing capabilities of the underlying data stores. To provide data federation, a typical polystore solution defines a common data model and query language with translations to API calls or queries to each data store. However, this may lead to losing important querying capabilities. The polyglot approach of the CloudMdsQL query language allows data store native queries to be expressed as inline scripts and combined with regular SQL statements in ad-hoc integration queries. Moreover, efficient optimization techniques, such as bind join, can still take place to improve the performance of selective joins. In this paper, we introduce the distributed architecture of the LeanXcale query engine that processes polyglot queries in the CloudMdsQL query language, yet allowing native scripts to be handled in parallel at data store shards, so that efficient and scalable parallel joins take place at the query engine level. The experimental evaluation of the LeanXcale parallel query engine on various join queries illustrates well the performance benefits of exploiting the parallelism of the underlying data management technologies in combination with the high expressivity provided by their scripting/querying frameworks.
Fichier principal
Vignette du fichier
PMSQP_v2.3.pdf (913.99 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

lirmm-01921718 , version 1 (14-11-2018)

Identifiers

Cite

Boyan Kolev, Oleksandra Levchenko, Esther Pacitti, Patrick Valduriez, Ricardo Vilaça, et al.. Parallel Polyglot Query Processing on Heterogeneous Cloud Data Stores with LeanXcale. IEEE International Conference on Big Data (Big Data 2018), Dec 2018, Seattle, United States. pp.1757-1766, ⟨10.1109/BigData.2018.8622187⟩. ⟨lirmm-01921718⟩
331 View
405 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More