Skip to content. | Skip to navigation

Personal tools
You are here: Home Theses Schema Inference and Functional Dependency Discovery in Data Lake System


Prof. Dr. S. Decker
RWTH Aachen
Informatik 5
Ahornstr. 55
D-52056 Aachen
Tel +49/241/8021501
Fax +49/241/8022321

How to find us

Annual Reports





Schema Inference and Functional Dependency Discovery in Data Lake System

Thesis type
  • Master
Status Finished
Submitted in 2019
Proposal on 07. Aug 2018 15:00
Proposal room Seminarraum I5
Add proposal to calendar vCal

To avoid a data lake turning into a "data swamp", sophisticated metadata management over raw data plays a vital role in our data lake system, namely Constance. Hence, the thesis is mainly handling the data ingestion in data lakes, especially structural metadata extraction, e.g., functional dependency discovery.

For structured data, it is relatively easy to obtain the explicit schema definitions. However, the metadata extraction for semi-structured data (such as XML and JSON) and graph data has become a challenge due to the implicity of their schemata. To address this complicated issue, this thesis covers the topic of the generic metadata model design. And a schema inference component would be implemented in our system.

Moreover, functional dependencies (FDs) express relationships of corresponding values between attributes of a database relations. Due to this property, FDs are considerable metadata used for data cleaning, query optimization and schema matching. FDs would be discovered on intra-relational and inter-relational levels. Meanwhile, the framework would be evaluated and a UI for the illustration of FD discovery result would be implemented.

Related projects

Document Actions