Categories
Pages
-

DBIS

Kategorie: ‘Theses’

Ontology-Grounded Extraction of Research SoftwareMentions from Scientific Publications

March 27th, 2026 | by

Research software is among the least discoverable scholarly outputs. While standards like CodeMeta and CFF enable structured software metadata at the repository level, they require active curation by maintainers and see inconsistent adoption. On the publication side, only select publishers such as Schloss Dagstuhl’s DROPS platform provide citable software artifacts, again contingent on explicit author action. As a result, most research software is mentioned only in unstructured publication text, invisible to metadata-driven search and agentic systems, and non-compliant with FAIR principles. This problem is compounded in applied domains like visualization research, where publications frequently describe custom tools and prototypes that are never publicly released, a case not covered by current metadata schemas. Automating ontology-grounded extraction of software mentions from publications would close this gap and enrich the metadata foundation for downstream services such as the research copilots developed within NFDI4DS.

Incremental Knowledge Graph Ingestion with Change Detection and Provenance Tracking

March 26th, 2026 | by

Keeping a knowledge graph up to date as its source data evolves is harder than building one from scratch. New records appear, existing records are corrected, and metadata is enriched over time. Each type of change a corrected DOI, an added co-author, a retracted publication carries different semantic implications and may require a different update strategy. Detecting these changes efficiently and propagating them without introducing duplicates, losing provenance, or overwriting valid data remains an open challenge, particularly when the goal is to avoid heavyweight versioning infrastructure.

From Triples to Pixels: Visual Knowledge Graph Encoding for Knowledge-Augmented Multiple-Choice Question Answering

March 2nd, 2026 | by

Knowledge-augmented multiple-choice question answering (MCQA) aims to improve robustness and factual grounding by integrating external structured knowledge (e.g., knowledge graphs) into language-model-based decision making. Current high-performing systems typically retrieve a local subgraph relevant to a question and candidate answers, then combine pretrained language representations with explicit graph reasoning modules.

This thesis investigates an alternative representation path: instead of processing retrieved knowledge graph (KG) subgraphs as symbolic triples with graph neural networks, the subgraphs are deterministically rendered into a compact 2D “visual graph” representation and encoded with a vision backbone. The resulting visual KG evidence is fused with a encoder-only language model via attention-based cross-modal interaction. The core research question is whether a visually encoded KG can preserve decision-relevant relational structure and support competitive knowledge-augmented MCQA performance on CommonsenseQA and OpenBookQA (optionally extending to MedQA-USMLE).

A Rule-Based Agent for Semantic Matching Graph Visualization

February 26th, 2026 | by

Resource-Efficient Cyber Risk and Criticality Assessment for Small Power Grid Operators: A Reproducible Algorithm for Deriving Security Requirements and Prioritized Mitigation Plans

February 24th, 2026 | by


Power grids are increasingly operated through tightly interconnected IT/OT infrastructures, which raises the attack surface and makes smaller operators with limited resources particularly vulnerable to security-relevant incidents. This thesis develops and evaluates a reproducible, resource-efficient analysis algorithm that captures essential system, process, role, location, and information-flow data to derive protection needs and criticality, and to generate a prioritized, actionable security improvement plan without requiring a full ISMS. The approach is prototyped and validated through a realistic case study, benchmarking against at least one reference method and incorporating expert interviews to assess effort, comprehensibility, traceability, domain coverage, and prioritization quality.

Master Thesis

Using LLMs with Knowledge Graphs to Enhance Code Generation

February 20th, 2026 | by

Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation [5,7]. However, they face significant limitations when working with specific software environments, particularly [3,5]:

  • Lack of access to existing codebases
  • Limited knowledge of project-specific packages, dependencies, and interfaces
  • Difficulty maintaining consistency with established code patterns and architectures

To address these challenges, Retrieval Augmented Generation (RAG) approaches have emerged, enabling LLMs to access relevant contextual information [6]. Among these, knowledge graph-based representations offer a novel and promising approach, providing structured semantic relationships that traditional RAG methods may miss [1,4].

This thesis explores the application of LLM-knowledge graph integration for domain-specific code generation, using time series analysis as a concrete use case [1].

Deep Table-Structure Integration for LLM-based Semantic Table Understanding

February 19th, 2026 | by

This thesis investigates how Large Language Models (LLMs) can be equipped with a deeper, architecture-level understanding of tabular data, going beyond “tables-as-serialized-text” toward tables-as-structured objects that expose row/column topology, header semantics, cell neighborhoods, and inter-cell dependencies to the model in a principled way [1,2,8]. The target setting is Semantic Table Interpretation (STI) as studied in the SemTab challenge, focusing on three standard downstream tasks: Cell Entity Annotation (CEA)Column Type Annotation (CTA), and Column Property Annotation (CPA) [3,4].

The work will be developed and evaluated primarily using the MammoTab 25 benchmark (Wikipedia-scale tables annotated against Wikidata) and SemTab-style evaluation protocols [6,7,14].

Formalizing Early-Stage Data Science Requirements for an LLM-Based Data Acquisition Agent

February 9th, 2026 | by

This thesis investigates how to formally represent early-stage data science requirements and how to support the automation of early-stage data science through an LLM-based agent.

Exploration and Application of Vision-Language Navigation (VLN) for Legged Robots in Subway Tunnel Environments

January 30th, 2026 | by

This thesis investigates whether Vision-and-Language Navigation (VLN) can be reliably transferred from conventional benchmarks to subway tunnel environments, enabling a quadruped robot to execute inspection-oriented navigation tasks under constrained geometry, degraded visibility, and limited connectivity. The work is motivated by recent vision-language-action approaches that connect language grounding with embodied control for legged platforms (e.g., NaVILA) [1], while the applicability of such paradigms to tunnel settings remains underexplored.

The study uses an existing tunnel environment dataset (visual and structural information) and a high-fidelity tunnel simulation setup to train and evaluate a VLN model. Evaluation will focus on instruction-following success, path efficiency, robustness to tunnel-specific disturbances, and (optionally) transfer to real-world deployment on a physical quadruped robot, following standard VLN evaluation practices [2,6].

This paper is co-advised with Fan Yang (yang@icom.rwth-aachen.de) at ICoM (Institute for Construction Management, Digital Engineering and Robotics in Construction). The second supervisor is Dr. Hendrik Morgenstern (morgenstern@icom.rwth-aachen.de).

Generating Synthetic Training Data with LLMs for Sentiment Analysis

January 23rd, 2026 | by

Sentiment analysis models detect emotion in text, but need retraining for each new context. To generate training data, Large Language Models (LLMs) are increasingly being used but performance is still limited. We aim to improve it via the creation of a structured framework for LLM-driven data synthesis.