Skip to content. | Skip to navigation

Informatik 5
Information Systems
Prof. Dr. M. Jarke
Sections
Personal tools
You are here: Home Publications Query Rewriting for Heterogeneous Data Lakes

Contact

Prof. Dr. M. Jarke
RWTH Aachen
Informatik 5
Ahornstr. 55
D-52056 Aachen
Tel +49/241/8021501
Fax +49/241/8022321

How to find us

Annual Reports

Disclaimer

Webmaster

 

 

Query Rewriting for Heterogeneous Data Lakes

Year 2018
Abstract URL view
PDF URL view
PDF File download
ISBN-10 978-3-319-98397-4
ISBN-13 978-3-319-98398-1

The increasing popularity of NoSQL systems has lead to the model of polyglot persistence, in which several data management systems with different data models are used. Data lakes realize the polyglot persistence model by collecting data from various sources, by storing the data in its original structure, and by providing the datasets for querying and analysis. Thus, one of the key tasks of data lakes is to provide a unified querying interface, which is able to rewrite queries expressed in a general data model into a union of queries for data sources spanning heterogeneous data stores. To address this challenge, we propose a novel framework for query rewriting that combines logical methods for data integration based on declarative mappings with a scalable big data query processing system (i.e., Apache Spark) to efficiently execute the rewritten queries and to reconcile the query results into an integrated dataset. Because of the diversity of NoSQL systems, our approach is based on a flexible and extensible architecture that currently supports the major data structures such as relational data, semi-structured data (e.g., JSON, XML), and graphs. We show the applicability of our query rewriting engine with six real world datasets and demonstrate its scalability using an artificial data integration scenario with multiple storage systems.

Details

22nd European Conference on Advances in Databases and Information Systems (ADBIS 2018)

Authors

Presented at

ADBIS, 2018 , Budapest , HU.

Published in

European Conference on Advances in Databases and Information Systems (ADBIS 2018) , by András BenczúrBernhard ThalheimTomáš Horváth , p. 35-49 ; Springer, Cham , Cham , CH .

Related projects

Document Actions