Skip to content. | Skip to navigation

Informatik 5
Information Systems
Prof. Dr. M. Jarke
Personal tools
You are here: Home Theses Running Theses


Prof. Dr. M. Jarke
RWTH Aachen
Informatik 5
Ahornstr. 55
D-52056 Aachen
Tel +49/241/8021501
Fax +49/241/8022321

How to find us

Annual Reports





Running Theses

A Distributed Data Revisioning System for the Internet of Production
Supervised by Prof. Dr. Stefan Decker; Advisor(s): Lars Gleim, M. Sc.
In the context of this thesis, the student should implement a distributed data revisioning system, enabling interorganizational data reuse and extension in the Internet of Production and Internet of Things.
Towards Automating Graph Data Cleansing Using Shapes
Supervised by Prof. Dr. Stefan Decker; Advisor(s): Dr. Oya Deniz Beyan, Ph.D. Heiner Oberkampf
Data curation is a labour intensive process transforming often completely unstructured data into structured data. This work proposes an approach to improve the quality of semi structured data efficiently and resolve conflicts with minimal effort. This is achieved by inferring a data model or improving a given one, by finding inconsistencies within the data, and by suggesting possible data edits using different machine learning and data mining techniques. Effort is further reduced by detecting conflicts that were caused by other conflicts using root cause analysis.
Feature Clustering and Visualization of High Dimensional Data using Clique Cover Theory
Approaches such as clustering and classification that are analytically or computationally manageable in low dimensions become intractable as the dimensions increases. This happens because of a phenomenon known as “the curse of dimensionality” which is commonly observed in high dimensional data. Thus the aim of this thesis is to come up with a novel approach for feature clustering, selection, and visualization using the graph theoretical approach of Clique Covers.
FAIR Identifier Registry for Distributed Systems
Supervised by Prof. Dr. Stefan Decker; Advisor(s): Dr. Oya Deniz Beyan
The active increase of digital data, as well as the generation of new varieties of digital content, has established several possibilities and challenges in the management of the big amount of data. Traditionally, organisations have relied on Uniform Resource Locator (URL) hyperlinks to provide involved parties with access to their digitised content via the Internet. Nevertheless, over time, more of these hyperlinks become invalid. The concept of persistent identification has been developed to solve this issue. Instead of addressing data directly through its actual locator, Persistent Identifiers (PIDs) permit data retrieval by globally unique and permanent identifiers. PIDs include metadata and resolving URLs that point to the original location of data collections. As the target URLs are changing, PIDs require constant maintenance. Existing PID systems has enabled long-term stable and unambiguous references. They are already in rapid use, supporting distributed approaches at varying levels. In most of the cases, PIDs are indexed and explored by a central system. Besides infrastructural questions, the discoverability of PIDs mainly depends on their metadata classifications, which express what a given PID represents. The aim of this thesis is to explore the creation of a meta-standard for the specification of domain-specific PID registries which are exposed by an application programming interface (API), while standard and API both respect Findable, Accessible, Interoperable and Reusable (FAIR) principles.
Patterns for Integrating Rule Based and Process Based Model Components of Computerized Clinical Guidelines
Supervised by Prof. Dr. Stefan Decker, Dr. rer. nat. Cord Spreckelsen; Advisor(s): 692050c6199c8bbfb9be2189e82ff904
Machine Learning for Anonymization of Unstructured Text
Supervised by Prof. Dr. Stefan Decker; Advisor(s): Dr. Michael Cochez
This thesis addresses the problem of identifying personal information in unstructured text using supervised Machine Learning (ML). The final application should be able to recognize and annotate the tokens that make up personal data in an English input text as accurately as possible. First, supervised learning methods, suitable for the task, will be identified. Then, models based on the most promising approaches will be designed and implemented. For comparison, suitable evaluation metrics have to be determined. Finally, the approaches are compared and evaluated against a baseline and each other.
Dynamic Embeddings of Evolving Knowledge Graphs
Supervised by Prof. Dr. Stefan Decker; Advisor(s): Dr. Michael Cochez, Dr. Florian Lemmerich
The goal of this Bachelor thesis is the research of updating KG embeddings with new information in order to obtain a dynamic and stable embedding of the fast-evolving KG while reducing the computational effort.
Classification of Cancer with methylation aware motifs
DNA methylation data has become a popular choice as features for cancer classification tasks. However, a big problem arising here is its high dimensionality in combination with the usually small amount of available samples. Therefore, we created lower-dimensional meta-features that take the effects of the altered DNA methylation pattern on the binding affinity of human Transcription Factors into account. Their performance is evaluated in this thesis.
Continuous Community Analytics
Supervised by PD Dr. Ralf Klamma, AOR
Goal of this thesis is an integration of post-mortem community data dumps with the MobSOS real-time community information systems success awareness framework.
Modeling for Street Level Crime Prediction
Supervised by Prof. Dr. Stefan Decker; Advisor(s): Dr. Michael Cochez, Cristina Kadar, Raquel Rosés Brüngger
The aim of this master thesis is to build a predictive model of crime at street level for a Swiss city, including a tool implementation for visualizing the data and results.
Machine Economy for Dynamic Configurations of Production Processes
Supervised by Prof. Dr. Thomas Rose; Advisor(s): Thomas Osterland
In the context of Industry 4.0 and the Internet of Things, the autonomization of cyber physical systems is increasingly coming to the fore. Just imagine an autonomous vehicle that offers commuter services against payment. The electric vehicle has to pay tolls, requires energy from charging stations and employs washing services from time to time. Considering all entities in this process as agents that can interact and carry wallets, one can easily envision a machine-to-machine economy. Technical agents decide what tasks to conduct regarding costs, capabilities and earnings as illustrated by the demonstrators of Smart Replenishment Box and Smart Vehicle Control1.
Graph-Structured Query Construction for Natural Language Questions
Supervised by Prof. Dr. Stefan Decker; Advisor(s): Dr. Michael Cochez
Graph-structured queries provide an efficient means to retrieve desired data from large-scale knowledge graphs. However, it is difficult for non-expert users to write such queries, and users prefer expressing their query intention through natural language questions. Recently, an increasing effort is being exerted to construct graph-structured queries for given natural language questions. At the core of the construction is to deduce the structure of the target query and retrieve vertices/edges of the underlying knowledge graph which constitute the query. Existing query construction methods rely on conventional graph-based algorithms and question understanding techniques, which lead to inefficient and degraded performances facing complicated natural language questions over knowledge graphs with large scales. In this thesis, we focus on this problem and propose novel construction models standing on recent knowledge graph embedding techniques. Extensive experiments were conducted on question answering benchmark datasets, and the results demonstrate that our models outperform baselines in terms of effectiveness and efficiency.
Privacy Attack on Social Networks Using Network Embeddings
Supervised by Prof. Dr. Markus Strohmaier, Prof. Dr. Stefan Decker; Advisor(s): Dr. Florian Lemmerich, Dr. Michael Cochez
Abstract. A company that runs a social network trains a node embedding on the network where each account is represented by one node. One user deletes his account. Thus, the company is legally required to remove all private information of that user. This includes the node associated with the user’s account and the vector representation of that node that is generated by the embedding. The company, however, does likely not delete the vector representations of the other nodes even though the removed node was used during training of these. Is it possible to identify the neighbors of the removed node? Which kinds of neighbors can be identified best, which cannot be identified? First results suggest that the identification of neighbors works well for some kind of nodes and is more difficult for others.
Extending the b-it Chain to execute smart contracts
Supervised by Prof. Dr. Thomas Rose; Advisor(s): Thomas Osterland
Often only noticed as a technology that enables the digital currency Bitcoin, blockchain is a novel protocol that allows the distributed and secure storing of information and untempered execution of program code in trust-less environments. Did you ever feel the intense desire to write a thesis about blockchain or do you have a slight hope that blockchain is the one-and-only topic that touches your heart? Use your chance now! We are looking forward to hear from you.