Extending CloudMdsQL with MFR for Big Data Integration
Abstract
In this short paper (see [2] for the long version), we propose a functional SQL-like query language (based on Cloud-MdsQL) and query engine to retrieve data from two different kinds of data stores - an RDBMS and a distributed data processing framework such as Apache Spark or Hadoop MapReduce on top of HDFS - and combine them by applying data integration operators (mostly joins). However, users need to be aware of how data are organized across the data stores, so that they write valid queries. The query therefore contains embedded invocations to the underlying data stores, expressed as subqueries. As our query language is functional, it introduces a tight coupling between data and functions.
Domains
Databases [cs.DB]Origin | Files produced by the author(s) |
---|
Loading...