Semantic Integration through Generative Model Architectures

January 19th, 2026

We are seeking a motivated master’s student to explore the application of Large
Language Models (LLMs) and Small Language Models (SLMs) for automated semantic mapping in data integration scenarios involving sensitive information. This thesis addresses a critical challenge in modern data management: domain experts often possess the knowledge needed to align local data sources with global schemas but lack the technical expertise to implement these mappings, while traditional automated approaches struggle with the semantic complexity of the task. This research investigates how language models can bridge this gap by enabling more intuitive, knowledge-driven data integration while maintaining strict data privacy and security requirements.

Thesis Type	Master
Status	Running
Presentation room	Seminar room I5 6202
Supervisor(s)	Sandra Geisler Stefan Decker
Advisor(s)	Laurenz Neumann Soo-Yon Kim
Contact	laurenz.neumann@dbis.rwth-aachen.de kim@dbis.rwth-aachen.de

Research Questions

This thesis will investigate several key aspects of LLM-based semantic mapping:

What is the optimal balance between information richness and privacy preservation in the input design (e.g. schema only vs. sample data)?
Are smaller, locally deployable language models (SLMs) sufficient for semantic mapping tasks, or do they require the capabilities of larger models for sufficient inference speed and mapping quality?
How can we incorporate the specialised domain knowledge of user via Human-in-the-loop approaches?

Methodology

The research will involve developing and evaluating different approaches to LLM-based semantic mapping, including comparative studies of input strategies (schema-only vs. schema-with-examples) and model architectures (cloud LLMs vs. local SLMs). You will design experiments using benchmark datasets and potentially collaborate with industry partners handling sensitive data.

Tasks

Comprehensive literature review on semantic mapping and LLM applications
Implementation of a proof-of-concept tool demonstrating different approaches
Experimental evaluation with quantitative and qualitative analysis

Initial Literature

Towards self-configuring Knowledge Graph Construction Pipelines using LLMs – A
Case Study with RML, Hofer et al.
Interactive Data Harmonization with LLM Agents, Santos et al.
KONDA: An LLM-based Tool for Semantic Annotation and Knowledge Graph Creation
Using Ontologies for Research Data, Kim et al.

In case you are interested in this thesis, please write an email to the thesis advisors.

Prerequisites:

Knowledge about databases and information systems
Experience or strong interest in LLM applications
Familiarity with semantic web technologies such as RDF
Preferred: experience in software development, ideally python and/or java

DBIS