Parallel Polyglot Query Processing on Heterogeneous Cloud Data Stores with LeanXcale

Boyan Kolev 1 Oleksandra Levchenko 1 Esther Pacitti 1 Patrick Valduriez 1 Ricardo Vilaça 2 Rui Gonçalves 2 Ricardo Jiménez-Peris 2 Pavlos Kranas 2, 3
1 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : The blooming of different cloud data stores has turned polystore systems to a major topic in the nowadays cloud landscape. Especially, as the amount of processed data grows rapidly each year, much attention is being paid on taking advantage of the parallel processing capabilities of the underlying data stores. To provide data federation, a typical polystore solution defines a common data model and query language with translations to API calls or queries to each data store. However, this may lead to losing important querying capabilities. The polyglot approach of the CloudMdsQL query language allows data store native queries to be expressed as inline scripts and combined with regular SQL statements in ad-hoc integration queries. Moreover, efficient optimization techniques, such as bind join, can still take place to improve the performance of selective joins. In this paper, we introduce the distributed architecture of the LeanXcale query engine that processes polyglot queries in the CloudMdsQL query language, yet allowing native scripts to be handled in parallel at data store shards, so that efficient and scalable parallel joins take place at the query engine level. The experimental evaluation of the LeanXcale parallel query engine on various join queries illustrates well the performance benefits of exploiting the parallelism of the underlying data management technologies in combination with the high expressivity provided by their scripting/querying frameworks.
Complete list of metadatas

Cited literature [26 references]  Display  Hide  Download

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01921718
Contributor : Patrick Valduriez <>
Submitted on : Wednesday, November 14, 2018 - 9:10:29 AM
Last modification on : Tuesday, February 19, 2019 - 10:38:45 AM
Long-term archiving on: Friday, February 15, 2019 - 12:37:08 PM

File

PMSQP_v2.3.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : lirmm-01921718, version 1

Collections

Citation

Boyan Kolev, Oleksandra Levchenko, Esther Pacitti, Patrick Valduriez, Ricardo Vilaça, et al.. Parallel Polyglot Query Processing on Heterogeneous Cloud Data Stores with LeanXcale. IEEE BigData, Dec 2018, Seattle, United States. pp.10. ⟨lirmm-01921718⟩

Share

Metrics

Record views

339

Files downloads

233