Skip to content. | Skip to navigation

Personal tools
You are here: Home Theses Investigating Replay Methods for Institutional Incremental Learning


Prof. Dr. S. Decker
RWTH Aachen
Informatik 5
Ahornstr. 55
D-52056 Aachen
Tel +49/241/8021501
Fax +49/241/8022321

How to find us

Annual Reports





Investigating Replay Methods for Institutional Incremental Learning

Thesis type
  • Master
Status Open

In recent years, as newer technologies have evolved around the healthcare ecosystem, more and more data have been generated.

Advanced analytics could power the data collected from numerous sources, both from healthcare institutions, or generated by individuals themselves via apps and devices, and lead to innovations in treatment and diagnosis of diseases; improve the care given to the patient; and empower citizens to participate in the decision-making process regarding their own health and well-being. However, the sensitive nature of the health data prohibits healthcare organizations from sharing the data.

The Personal Health Train (PHT) is a novel approach, aiming to establish a distributed data analytics infrastructure enabling the (re)use of distributed healthcare data, while data owners stay in control of their own data.

The main principle of the PHT is that data remains in its original location, and analytical tasks visit data sources and execute the tasks. The PHT provides a distributed, flexible approach to use data in a network of participants, incorporating the FAIR principles.

It facilitates the responsible use of sensitive and/or personal data by adopting international principles and regulations.


Usually, the PHT visits the data sources incrementally/cyclically. Meaning that the PHT visits the data sources one by one and after the last source it returns to the first one again for a pre-defined number of iterations.

However, using this training heuristic, a phenomenon called Catastrophic Forgetting (CS) can occur - the forgetting of already learned features or the overwriting of already learned knowledge.

In order to mitigate the impact of CS, so-called Replay/Rehearsal methods (RM) can be applied such that the learning algorithm has access to old features.

This Master Thesis focuses on the application of RM for the use case of Institutional Incremental Learning.

Possible research questions could be:

  1. Does RM have impact on the training success of incrementally trained models under highly imbalanced data?
  2. Since data never leaves the origin, how can RM be applied such that no data breach occurs (e.g. synthetic data)?
  3. Since data instances have different levels of informativeness, how can RM be optimised such that only valuable instances are reused to train the model?

If you are interested in this thesis, a related topic or have additional questions, please do not hesitate to send a message to


In-depth knowledge in Machine Learning and general statistics
Coding: Python

Document Actions