Skip to content. | Skip to navigation

Personal tools
You are here: Home Theses Integrated schema generation for Data Lakes


Prof. Dr. S. Decker
RWTH Aachen
Informatik 5
Ahornstr. 55
D-52056 Aachen
Tel +49/241/8021501
Fax +49/241/8022321

How to find us

Annual Reports





Integrated schema generation for Data Lakes

Thesis type
  • Master
Status Finished
Submitted in 2020

We are in the Big Data era, valuable information is often stored in different places, known as information silos. Yet valuable insights, business decisions may be only available upon the integration of data from different sources. New Big Data systems, such as Data Lakes, conduct a different way of data analytics compared to traditional data warehouse. One does not need to know the structure of data until the moment they need to use the data, also known as schema-on-read, or “store first, query/model later”.

In the scope of this thesis, the research problem is to investigate a flexible manner of generating the integrated schema, which is a unified interface for user to pose their application requirements, e.g., SQL queries. The thesis student will develop a Spark-based system for generating integrated schema in the scope of our data lake.


For more detailed thesis goals and requirements, please contact Rihan Hai (


Related projects

Document Actions