Kategorie: ‘Theses’
From Triples to Pixels: Visual Knowledge Graph Encoding for Knowledge-Augmented Multiple-Choice Question Answering
Knowledge-augmented multiple-choice question answering (MCQA) aims to improve robustness and factual grounding by integrating external structured knowledge (e.g., knowledge graphs) into language-model-based decision making. Current high-performing systems typically retrieve a local subgraph relevant to a question and candidate answers, then combine pretrained language representations with explicit graph reasoning modules.
This thesis investigates an alternative representation path: instead of processing retrieved knowledge graph (KG) subgraphs as symbolic triples with graph neural networks, the subgraphs are deterministically rendered into a compact 2D “visual graph” representation and encoded with a vision backbone. The resulting visual KG evidence is fused with a encoder-only language model via attention-based cross-modal interaction. The core research question is whether a visually encoded KG can preserve decision-relevant relational structure and support competitive knowledge-augmented MCQA performance on CommonsenseQA and OpenBookQA (optionally extending to MedQA-USMLE).
A Rule-Based Agent for Semantic Matching Graph Visualization
Resource-Efficient Cyber Risk and Criticality Assessment for Small Power Grid Operators: A Reproducible Algorithm for Deriving Security Requirements and Prioritized Mitigation Plans
Power grids are increasingly operated through tightly interconnected IT/OT infrastructures, which raises the attack surface and makes smaller operators with limited resources particularly vulnerable to security-relevant incidents. This thesis develops and evaluates a reproducible, resource-efficient analysis algorithm that captures essential system, process, role, location, and information-flow data to derive protection needs and criticality, and to generate a prioritized, actionable security improvement plan without requiring a full ISMS. The approach is prototyped and validated through a realistic case study, benchmarking against at least one reference method and incorporating expert interviews to assess effort, comprehensibility, traceability, domain coverage, and prioritization quality.
Master Thesis
Using LLMs with Knowledge Graphs to Enhance Code Generation
Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation [5,7]. However, they face significant limitations when working with specific software environments, particularly [3,5]:
- Lack of access to existing codebases
- Limited knowledge of project-specific packages, dependencies, and interfaces
- Difficulty maintaining consistency with established code patterns and architectures
To address these challenges, Retrieval Augmented Generation (RAG) approaches have emerged, enabling LLMs to access relevant contextual information [6]. Among these, knowledge graph-based representations offer a novel and promising approach, providing structured semantic relationships that traditional RAG methods may miss [1,4].
This thesis explores the application of LLM-knowledge graph integration for domain-specific code generation, using time series analysis as a concrete use case [1].
Deep Table-Structure Integration for LLM-based Semantic Table Understanding
This thesis investigates how Large Language Models (LLMs) can be equipped with a deeper, architecture-level understanding of tabular data, going beyond “tables-as-serialized-text” toward tables-as-structured objects that expose row/column topology, header semantics, cell neighborhoods, and inter-cell dependencies to the model in a principled way [1,2,8]. The target setting is Semantic Table Interpretation (STI) as studied in the SemTab challenge, focusing on three standard downstream tasks: Cell Entity Annotation (CEA), Column Type Annotation (CTA), and Column Property Annotation (CPA) [3,4].
The work will be developed and evaluated primarily using the MammoTab 25 benchmark (Wikipedia-scale tables annotated against Wikidata) and SemTab-style evaluation protocols [6,7,14].
Formalizing Early-Stage Data Science Requirements for an LLM-Based Data Acquisition Agent
This thesis investigates how to formally represent early-stage data science requirements and how to support the automation of early-stage data science through an LLM-based agent.
Exploration and Application of Vision-Language Navigation (VLN) for Legged Robots in Subway Tunnel Environments
This thesis investigates whether Vision-and-Language Navigation (VLN) can be reliably transferred from conventional benchmarks to subway tunnel environments, enabling a quadruped robot to execute inspection-oriented navigation tasks under constrained geometry, degraded visibility, and limited connectivity. The work is motivated by recent vision-language-action approaches that connect language grounding with embodied control for legged platforms (e.g., NaVILA) [1], while the applicability of such paradigms to tunnel settings remains underexplored.
The study uses an existing tunnel environment dataset (visual and structural information) and a high-fidelity tunnel simulation setup to train and evaluate a VLN model. Evaluation will focus on instruction-following success, path efficiency, robustness to tunnel-specific disturbances, and (optionally) transfer to real-world deployment on a physical quadruped robot, following standard VLN evaluation practices [2,6].
This paper is co-advised with Fan Yang (yang@icom.rwth-aachen.de) at ICoM (Institute for Construction Management, Digital Engineering and Robotics in Construction). The second supervisor is Dr. Hendrik Morgenstern (morgenstern@icom.rwth-aachen.de).
Generating Synthetic Training Data with LLMs for Sentiment Analysis
Sentiment analysis models detect emotion in text, but need retraining for each new context. To generate training data, Large Language Models (LLMs) are increasingly being used but performance is still limited. We aim to improve it via the creation of a structured framework for LLM-driven data synthesis.
Semantic Integration through Generative Model Architectures
We are seeking a motivated master’s student to explore the application of Large
Language Models (LLMs) and Small Language Models (SLMs) for automated semantic mapping in data integration scenarios involving sensitive information. This thesis addresses a critical challenge in modern data management: domain experts often possess the knowledge needed to align local data sources with global schemas but lack the technical expertise to implement these mappings, while traditional automated approaches struggle with the semantic complexity of the task. This research investigates how language models can bridge this gap by enabling more intuitive, knowledge-driven data integration while maintaining strict data privacy and security requirements.
Decentralized and Privacy-preserving Message Inboxes for Few-Shot Communication
Users are increasingly required to give away private Email addresses in order be reachable by service providers, e.g., to be receive invoices, digital receipts, or newsletters. While this is especially true for digital services, also physical interactions increasingly shift toward involving digital information exchanges. Most notably, paper-based receipts are being replaced by digital equivalents.
However, digital receipts do not carry over the same privacy model physical customers are accustomed to: Digital receipts are either sent to the customer via Email, accessible via smartphone apps, or otherwise fetched from the service provider’s servers. All of these models bear the risk that service providers may try and link different interactions (e.g., purchases) and gain additional information (e.g., buyer profiles) from analyzing such information without the customer’s consent. Further, an over-reliance on the service provider’s communication channels introduces additional availability requirements on their side and may limit interoperability between the information flows by different service providers.
The Solid Project aims to establish a decentralized infrastructure that empowers users to take control of their personal online storage while ensuring high availability and interoperability. However, Solid is focused more on access control than user privacy at the moment.
In this thesis, you will hence explore the aptitude of Solid’s design to also boost user privacy and realize the information flows outlined above in a truly privacy-preserving manner. To this end, you will model privacy requirements for the above settings, analyze related Solid standards for compatibility with a privacy-focused approach, and you will design, implement, and evaluate a prototypic Solid-based infrastructure enhanced with different Privacy-Enhancing Technologies (PETs) that boost user privacy. Finally, you will assess the the pros and cons of different PETs in terms of privacy benefits, functionality tradeoffs, and performance overhead.