Categories
Pages
-

DBIS

Kategorie: ‘Theses’

Semantic Integration through Generative Model Architectures

January 19th, 2026 | by

We are seeking a motivated master’s student to explore the application of Large
Language Models (LLMs) and Small Language Models (SLMs) for automated semantic mapping in data integration scenarios involving sensitive information. This thesis addresses a critical challenge in modern data management: domain experts often possess the knowledge needed to align local data sources with global schemas but lack the technical expertise to implement these mappings, while traditional automated approaches struggle with the semantic complexity of the task. This research investigates how language models can bridge this gap by enabling more intuitive, knowledge-driven data integration while maintaining strict data privacy and security requirements.

Decentralized and Privacy-preserving Message Inboxes for Few-Shot Communication

January 15th, 2026 | by

Users are increasingly required to give away private Email addresses in order be reachable by service providers, e.g., to be receive invoices, digital receipts, or newsletters. While this is especially true for digital services, also physical interactions increasingly shift toward involving digital information exchanges. Most notably, paper-based receipts are being replaced by digital equivalents.

However, digital receipts do not carry over the same privacy model physical customers are accustomed to: Digital receipts are either sent to the customer via Email, accessible via smartphone apps, or otherwise fetched from the service provider’s servers. All of these models bear the risk that service providers may try and link different interactions (e.g., purchases) and gain additional information (e.g., buyer profiles) from analyzing such information without the customer’s consent. Further, an over-reliance on the service provider’s communication channels introduces additional availability requirements on their side and may limit interoperability between the information flows by different service providers.

The Solid Project aims to establish a decentralized infrastructure that empowers users to take control of their personal online storage while ensuring high availability and interoperability. However, Solid is focused more on access control than user privacy at the moment.

In this thesis, you will hence explore the aptitude of Solid’s design to also boost user privacy and realize the information flows outlined above in a truly privacy-preserving manner. To this end, you will model privacy requirements for the above settings, analyze related Solid standards for compatibility with a privacy-focused approach, and you will design, implement, and evaluate a prototypic Solid-based infrastructure enhanced with different Privacy-Enhancing Technologies (PETs) that boost user privacy. Finally, you will assess the the pros and cons of different PETs in terms of privacy benefits, functionality tradeoffs, and performance overhead.

Comparative analyses of hybrid LLMs with Knowledge base integration and RAGs in biomedical domain

January 15th, 2026 | by

Large language models (LLMs) are increasingly used in biomedical applications, including literature mining (PMID: 40188094), drug discovery (PMID: 38730226; 41362614; https://arxiv.org/abs/2510.27130), clinical decision support (PMID: 40753316), and patient data analysis (PMID: 41034564). Hybrid approaches combining LLMs with structured knowledge bases and retrieval-augmented generation (RAG) improve performance and interpretability (PMID: 38830083; https://www.biorxiv.org/content/10.1101/2025.05.08.652829v2) . However, LLM-based systems remain vulnerable to hallucinations and generate associations that lack explicit evidence and traceability. This limits their reliability in high-stakes biomedical research. There is an urgent need for methods that systematically ground and validate LLM-derived associations using structured biomedical knowledge, such as knowledge graphs, to enable transparent, evidence-based discovery.

YANKEE: YouTube-ANnotated Knowledge Extraction Engine

November 17th, 2025 | by

The aim of this thesis is to extend an existing system for providing psychomotor feedback in a camera-based learning environment by automating or supporting the rule creation process.

The core objective is to leverage computer vision techniques and large language models (LLMs) to extract motion data from YouTube tutorial videos and automatically infer psychomotor feedback rules, which can be integrated into the existing feedback engine.

By doing so, the need for expert manual input in defining feedback rules would be minimized, thus streamlining the feedback process for learners in various psychomotor skill domains.

A Framework for Automated Sanitization of Cybersecurity Playbooks

November 13th, 2025 | by

DISCO-ML: Decision Interoperability Specification Conventions for Operationalized Machine Learning

October 20th, 2025 | by

Sensor-based machine learning (ML) systems (such as predictive maintenance, environmental monitoring, and industrial automation) require scalable, explainable, and continuously evolving data infrastructures. The complexity of these systems lies not only in the technical pipeline (data ingestion, feature engineering, model training, deployment, monitoring) but also in the design decisions stakeholders make along the way. These decisions range from architectural trade-offs (edge vs. cloud processing), ethical considerations (data privacy, fairness), to explainability requirements and system scalability.

While existing modeling techniques support documenting software architecture and data flows, there is no widely accepted notation that explicitly captures, traces, and communicates design decisions for sensor-based ML infrastructures in a way that is interpretable across diverse stakeholders (e.g., data scientists, system architects, domain experts, managers) and/or machine-readable.

This thesis aims to identify, compare, and develop suitable modeling notations and information structures that support the transparent documentation and communication of design decisions in sensor-based ML systems.

A Foundational small LLM-based Framework to enable Compliance Checking

October 17th, 2025 | by


Design and Evaluation of AI Agent Systems for Enterprise Software

September 8th, 2025 | by

Enterprise Software has become a critical pillar in global digital transformation. In China, for example, next-generation platforms such as DingTalk and Feishu not only integrate office automation functionalities but also play a central role in project management, team collaboration, and workflow optimization, which enable efficient cross-department collaboration, task transparency, and workflow automation, thereby enhancing organizational efficiency and accelerating digital transformation.

In Germany, however, despite government and industry efforts to promote “enterprise digitalization”, the adoption and application of enterprise software remains relatively limited. Particularly among small and medium-sized enterprises (SMEs), high procurement and maintenance costs, complex system integration, and limited intelligence hinder widespread adoption. As a result, many companies still rely on traditional tools (e.g., email, paper-based approvals, or spreadsheets) and manual operations.

With the rapid development of artificial intelligence (AI), especially large language models (LLMs)-based agents with autonomous decision-making and execution capabilities, enterprise software is expected to evolve from a “passive tool” to an “active collaborator”. AI Agents can understand user needs, automate repetitive tasks, coordinate cross-department workflows, and continuously improve adaptability through learning, which has the potential to significantly enhance efficiency and user experience, offering new opportunities for upgrading enterprise software in Germany.

AI-Driven Intelligent Scheduling System in Healthcare and Long-Term Care

September 7th, 2025 | by

The healthcare and long-term care (LTC) sector is experiencing severe workforce challenges. Nurses and caregivers are frequently confronted with excessive workloads, mandatory overtime, and the constraints of complex union agreements. These pressures contribute to burnout, high turnover rates, and rising labour costs, while simultaneously undermining care quality and patient satisfaction.

Conventional workforce management (WFM) tools are largely focused on record-keeping and compliance auditing, which lack predictive and optimization capabilities, making them inadequate for the highly dynamic and complex nature of healthcare scheduling. In this thesis, we aim to develop a next-generation, AI-driven, compliance-aware, and human-centered intelligent scheduling platform, that can satisfy regulatory and union requirements while also respecting staff preferences and improving organizational outcomes.

CHALLENGE: Collaborative Human-Agent Learning for Leveraging Engagement in Negotiated Governance of MLOps

August 27th, 2025 | by

The goal of this thesis is to design and prototype a serious game that simulates stakeholder engagement challenges in MLOps lifecycles. The game will make use of LLM-based agents to represent typical stakeholder roles (e.g., data scientists, ML engineers, domain experts, operations managers) and allow players to interact with them in different lifecycle stages.

The aim is to both explore the dynamics of stakeholder collaboration in MLOps and provide an educational tool for students, practitioners, and researchers to better understand the complexities of human and organizational factors in MLOps.