Skip to content. | Skip to navigation

Personal tools
You are here: Home Theses Machine Learning for Anonymization of Unstructured Text


Prof. Dr. S. Decker
RWTH Aachen
Informatik 5
Ahornstr. 55
D-52056 Aachen
Tel +49/241/8021501
Fax +49/241/8022321

How to find us

Annual Reports





Machine Learning for Anonymization of Unstructured Text

Thesis type
  • Bachelor
Status Running

This thesis addresses the problem of identifying personal information in unstructured text using supervised Machine Learning (ML). The final application should be able to recognize and annotate the tokens that make up personal data in an English input text as accurately as possible. First, supervised learning methods, suitable for the task, will be identified. Then, models based on the most promising approaches will be designed and implemented. For comparison, suitable evaluation metrics have to be determined. Finally, the approaches are compared and evaluated against a baseline and each other.

Document Actions