Skip to content. | Skip to navigation

Personal tools
You are here: Home Theses A containerisation pipeline for distributed analytics algorithms: The algorithm assembly line


Prof. Dr. S. Decker
RWTH Aachen
Informatik 5
Ahornstr. 55
D-52056 Aachen
Tel +49/241/8021501
Fax +49/241/8022321

How to find us

Annual Reports





A containerisation pipeline for distributed analytics algorithms: The algorithm assembly line

Thesis type
  • Bachelor
  • Master
Status Running

In recent years, as newer technologies have evolved around the healthcare ecosystem, more and more data have been generated.

Advanced analytics could power the data collected from numerous sources, both from healthcare institutions, or generated by individuals themselves via apps and devices, and lead to innovations in treatment and diagnosis of diseases; improve the care given to the patient; and empower citizens to participate in the decision-making process regarding their own health and well-being. However, the sensitive nature of the health data prohibits healthcare organizations from sharing the data.

The Personal Health Train (PHT) is a novel approach, aiming to establish a distributed data analytics infrastructure enabling the (re)use of distributed healthcare data, while data owners stay in control of their own data.

The main principle of the PHT is that data remains in its original location, and analytical tasks visit data sources and execute the tasks. The PHT provides a distributed, flexible approach to use data in a network of participants, incorporating the FAIR principles.

It facilitates the responsible use of sensitive and/or personal data by adopting international principles and regulations.


This Thesis focusses on the creation of PHTs used for several data analytics tasks.

In a distributed analytics ecosystem various data providers expose their data in a different and inconsistent way. For every data type, the interfaces to the data have to be adjusted for every train, which complicate the the train creation.

The goal of this thesis is to design a four-step pipeline based on the plug&play principle. In a first step, the train creator should select an interface indicating the data type the train should consume. This interface forward the data stream to the algorithm where the data analysis takes place.

Then, the creator selects or develops the algorithm of choice by using a code template proposed by the tool. The creator should be able to design the algorithm based on this skeleton and the pre-defined templates.

After creating the algorithm, the creator of the train should enrich the PHT (container) with META-Data giving the train more semantics and in the final step the PHT is created including predefined input interfaces and output interfaces for the analysis results.


If you are interested in this thesis, a related topic or have additional questions, please do not hesitate to send a message to


Basic Knowledge in Machine Learning and general statistics
In-depth knowledge in containerisation technologies, preferably Docker.

Related projects

Document Actions