This thesis investigates the application of federated learning (FL) to the Personal Health Train (PHT) paradigm, exploring how FL can be better adapted to improve privacy-preserving data analysis in healthcare. The research examines how PHT can facilitate secure, distributed machine learning on sensitive medical data across different institutions, while ensuring data privacy and compliance with regulatory standards.
Thesis Type |
|
Student |
Yixiao Cai |
Status |
Running |
Presentation room |
Seminar room I5 6202 |
Supervisor(s) |
Stefan Decker |
Advisor(s) |
Yongli Mou |
Contact |
mou@dbis.rwth-aachen.de |
Background
The Personal Health Train (PHT) concept enables the analysis of sensitive health data by allowing machine learning algorithms to travel to various data repositories while the data itself remains securely on-site. This approach is crucial in healthcare, where privacy and data protection regulations, such as GDPR, limit the movement of patient data. Federated learning (FL) aligns well with the PHT by training models locally across distributed datasets without centralizing the data. However, traditional FL approaches face challenges when applied to highly heterogeneous medical data across institutions, such as variations in data distributions, privacy concerns, and communication inefficiencies. This thesis aims to rethink how FL can be tailored to the unique requirements of PHT and healthcare data
Objectives
- Analyze the limitations of current PHT architecture for FL.
- Propose enhancements or adaptations to PHT/FL that improve privacy and efficiency when handling heterogeneous medical data.
- Ensure compliance with data protection regulations and privacy-preserving techniques within the PHT and FL setup.
Tasks
- Perform a literature review on federated learning applications in healthcare and the Personal Health Train framework.
- Identify and assess the current limitations of PHT for handling FL, for example, highly heterogeneous data.
- Develop and implement a prototype FL model tailored to the PHT framework.
- Evaluate the prototype on simulated or real-world healthcare datasets, focusing on privacy, accuracy, and communication efficiency.