Skip to content. | Skip to navigation

Personal tools
You are here: Home Theses Design and Implementation of a System for Exploring Semi-Structured Datasets


Prof. Dr. S. Decker
RWTH Aachen
Informatik 5
Ahornstr. 55
D-52056 Aachen
Tel +49/241/8021501
Fax +49/241/8022321

How to find us

Annual Reports





Design and Implementation of a System for Exploring Semi-Structured Datasets

Thesis type
  • Bachelor
Student Matthias Jeschke
Status Finished
Submitted in 2017
Proposal on 23. Sep 2016 16:00
Proposal room Seminarraum I5
Add proposal to calendar vCal

Not all data is structured as the tables in RDBMS; especially, Big Data applications are processing data from various sources in heterogeneous formats. Everyday enormous data is generated unprecedentedly with all kinds of format, e.g., spreadsheets, XML files, text. Although the data has many different formats, the data has usually some kind of structure which could be exploited to build up semi-structured datasets.

Information extracted from semi-structured datasets can be converted to valuable insight for decision making. To achieve this, data has to be made available in an efficient system for data exploration and query processing.

The thesis goal has two parts: the first part is about analysis and comparison of the existing tools and libraries utilized for exploring and managing semi-structured data, for instance, Elasticsearch, Lucene, Tika, Solr and Jackrabbit.

Based on this, 1-3 systems should be chosen for the implementation of a proof-of-concept (POC). The POC should use data from the mi-Mappa project, in which patents and publications are analyzed to build profiles of researchers.

Related projects

Document Actions