Skip to content. | Skip to navigation

Informatik 5
Information Systems
Prof. Dr. M. Jarke
Personal tools
You are here: Home Theses A Smart Routing Algorithm for the Personal Health Train (PHT)


Prof. Dr. M. Jarke
RWTH Aachen
Informatik 5
Ahornstr. 55
D-52056 Aachen
Tel +49/241/8021501
Fax +49/241/8022321

How to find us

Annual Reports





A Smart Routing Algorithm for the Personal Health Train (PHT)

Thesis type
  • Master
Status Open

In the context of this master thesis, the student should investigate on different routing heuristics for the PHT in a distributed Machine Learning/Deep Learning setting.

In recent years, as newer technologies have evolved around the healthcare ecosystem, more and more data have been generated. 

Advanced analytics could power the data collected from numerous sources, both from healthcare institutions, or generated by individuals themselves via apps and devices, and lead to innovations in treatment and diagnosis of diseases; 

improve the care given to the patient; and empower citizens to participate in the decision-making process regarding their own health and well-being. However, the sensitive nature of the health data prohibits healthcare organizations from sharing the data. 

The Personal Health Train (PHT) is a novel approach, aiming to establish a distributed data analytics infrastructure enabling the (re)use of distributed healthcare data, while data owners stay in control of their own data. 

The main principle of the PHT is that data remains in its original location, and analytical tasks visit data sources and execute the tasks. The PHT provides a distributed, flexible approach to use data in a network of participants, incorporating the FAIR principles. 

It facilitates the responsible use of sensitive and/or personal data by adopting international principles and regulations.


This Master Thesis focusses on PHTs used for Machine Learning/Deep Learning tasks.

Usually, the PHT visits the data sources incrementally/cyclically. Meaning that the PHT visits the data sources one by one and after the last source it returns to the first one again for a pre-defined number of iterations.

These routing heuristics are very simple but are lacking of a 'smart essence' which defines the routing order depending on data quality or other data characteristics.

In this thesis, the student should develop possibilities to rate a data source/dataset depending on e.g. the underlying data distribution, data quality measurements, and/or the used Machine Learning method. 

These profiling methods should return a score indicating the quality of the dataset in the given setting.

In a second step, a (smart) routing algorithm incorporating the calculated score for each data source should be developed. This algorithm should return a sequence of data sources the train has to visit e.g. in order to learn more efficiently or to produce a better model (compared to the incremental/cyclic heuristic).


If you are interested in this thesis, a related topic or have additional questions, please do not hesitate to send a message to

Related Work:


In-depth knowledge in Machine Learning and general statistics
Coding: Python

Document Actions