A Containerisation Pipeline for Distributed Analytics Algorithms: The Algorithm Assembly Line

January 11th, 2022

Thesis Type Master
Status Finished
Advisor(s) Sascha Welten

The state-of-the-art Personal Health Train (PHT) infrastructure enables Distributed Analytics on decentralized healthcare data while complying with the data protection laws. The Train in the PHT architecture is a data analysis algorithm encapsulated in a docker container. On the other hand, a Station is an institution with privacy-sensitive data. These Stations receives the Train one by one, then executes the algorithm and finally append the local results to the Train. The thesis aims to design a process workflow in the form of a containerisation pipeline with the necessary metadata, to create the Trains. The additional information, called metadata, will help establish an agreement between the Train and the Stations. The workflow will create standardized Trains with appropriate connection interfaces and package all the necessary files in a docker container to make it PHT deployable.
Furthermore, the second objective of the thesis is to investigate and design a way to establish trust in the published Trains. To achieve this task, a community-based app will be developed, a platform to display all Trains and their metadata and user feedback. It will also perform a static vulnerability analysis on the Train image to secure the images from known vulnerabilities and exposures.