Categories
Pages
-

DBIS

Kategorie: ‘Theses’

Developing an Explainable Anomaly Detection System forSmart Grids by Incorporating Structural and Operational GridKnowledge

June 24th, 2026 | by

Thesis Type

Master

Student: Sebastian Miller

Status

In Progress

Background

Supervisory control and data acquisition (SCADA) systems are increasingly connected through information and communication technologies, exposing smart grids to cyberattacks and operational disruptions. Conventional signature-based intrusion detection systems (IDSs) reliably identify known attacks but cannot detect previously unseen patterns, while statistical and machine-learning-based IDSs may achieve high detection rates but often provide limited transparency and generate false positives. Moreover, both classes frequently struggle with process-related attacks in which individually legitimate commands are executed in an unusual sequence and only become harmful over time. In energy distribution grids, this limitation is particularly critical because operators must not only recognize an anomaly but also understand which component or operational action caused it. The IEC 60870-5-104 communication protocol, together with structural and operational knowledge represented in a Smart Grid Architecture Model (SGAM)-aligned knowledge graph, offers a basis for developing a process-aware and explainable detection approach.

Objectives

The objective of this master’s thesis is to develop and evaluate an explainable, process-aware anomaly detection system for smart grids. The approach combines sequential pattern mining of IEC 60870-5-104 communication data with an RDF-based graph model of grid structure, device roles, communication relationships, and grid-operator actions. The system shall learn frequent sequences from normal operation, identify anomalous or rare sequences in attack scenarios, and use the knowledge graph to indicate likely affected devices and explain deviations from the expected control flow. A secondary objective is to create a reproducible dataset of normal and process-related attack scenarios and to compare the proposed approach with established process-state-aware IDSs through the IPAL framework.

Tasks

The student first conducts a focused literature review on anomaly detection in SCADA and smart-grid environments, explainable intrusion detection, process-aware detection, sequential pattern mining, SGAM-based modeling, and IEC 60870-5-104 traffic analysis. Based on this review, the student derives a precise requirement specification covering the available input data, the required level of event abstraction, explainability criteria, supported attack classes, and evaluation metrics. In parallel, the student becomes familiar with the FIT smart-grid co-simulation environment and the existing SGAM-aligned RDF ontology. Selected grid-operator actions, such as state estimation, feed-in management, transformer tap switching, topology changes, and load shedding, are modeled or linked in the ontology so that expected operational sequences and participating device types can be queried.

The core technical work comprises the creation of simulation scenarios for normal operation and representative process-related attacks, the extraction of IEC 60870-5-104 communication data from PCAP files, and the semantic enrichment of the extracted records. The student maps sender addresses and Information Object Addresses to concrete devices and locations through SPARQL queries, removes redundant status reports, and defines a reproducible event-abstraction scheme for discrete and continuous process values. Based on these events, the student implements a sequential pattern mining method, with particular emphasis on rare sequential patterns, and develops logic for comparing anomalous sequences with frequent reference patterns. The resulting detector shall use graph knowledge to distinguish similar device types, relate events to operator actions, and generate an explanation that highlights deviations and likely affected components.

The approach is implemented as a documented prototype that accepts simulation data and a corresponding knowledge graph, generates event sequences, detects anomalies, and produces human-readable diagnostic output. For evaluation, the student creates a dataset containing training data from normal operation and test data with several attack scenarios. Using IPAL, the prototype is compared with selected process-state-aware IDSs such as PASAD, Seq2Seq-NN, or TABOR. Detection accuracy, false-positive behavior, robustness to different abstraction parameters, and computational effort are assessed quantitatively; the usefulness of the generated explanations and device-level localization is assessed qualitatively. All results are documented, including the ontology extensions, extraction and abstraction rules, algorithm specification, prototype architecture, dataset and scenario definitions, evaluation design, limitations, and recommendations for future online deployment.

Prerequisites

Basic knowledge of IT security, network protocols, and industrial control or SCADA systems; interest in smart grids and cyber-physical energy systems; programming skills, preferably in Python, for PCAP processing and data analysis; and willingness to work with graph-based data models, RDF/SPARQL, and simulation environments. Experience with anomaly detection, pattern mining, IEC 60870-5-104, or semantic-web technologies is helpful but not required. The student should be able to work methodically, document technical decisions clearly, and evaluate a prototype using reproducible experiments.

References (MLA)

  • CEN-CENELEC-ETSI Smart Grid Coordination Group. Smart Grid Reference Architecture. 2012.
  • Rahman, A., et al. “Finding Anomalies in SCADA Logs Using Rare Sequential Pattern Mining.” International Conference on Network and System Security, Springer, 2016, pp. 499-506.
  • Van Der Velde, D., Ö. Sen, and I. Hacker. “Towards a Scalable and Flexible Smart Grid Co-Simulation Environment to Investigate Communication Infrastructures for Resilient Distribution Grid Operation.” 2021 International Conference on Smart Energy Systems and Technologies (SEST), IEEE, 2021, pp. 1-6.
  • Wolsing, K., et al. “IPAL: Breaking Up Silos of Protocol-Dependent and Domain-Specific Industrial Intrusion Detection Systems.” Proceedings of the 25th International Symposium on Research in Attacks, Intrusions and Defenses, 2022, pp. 510-525.
  • Lin, C.-Y., and S. Nadjm-Tehrani. “Understanding IEC-60870-5-104 Traffic Patterns in SCADA Networks.” Proceedings of the 4th ACM Workshop on Cyber-Physical System Security, 2018, pp. 51-60.

Ontology Reduction for Scalable, Semantic Data Integration

June 22nd, 2026 | by

This thesis investigates how compact, application-specific ontology modules can be extracted automatically from large ontologies, preserving the semantic coherence needed for downstream tasks while drastically reducing complexity.

A Comparison of Interaction Modalities for Extended Reality Agents

June 22nd, 2026 | by

Currently, the primary way users interact with Large Language Models (LLMs) is through two-dimensional chat interfaces. However, for use cases in Extended Reality (XR) environments, the interaction paradigm shifts from a flat screen to a spatial experience. Here, LLMs can, e.g., be represented as XR agents, a personified version of the LLM. While 3D environments offer high potential for more immersive and intuitive interactions, they also introduce significant challenges regarding user interface design. Simply porting a 2D chat window into a 3D space often feels clunky or breaks the sense of presence, yet purely voice-based interaction may lack the precision or privacy that text provides. This is especially true for tutoring scenarios where the agents need to give precise and memorable instructions. There is currently a lack of systematic research on which interface modalities best support the strengths of LLMs while maintaining the immersion of an XR environment.

A Generative Foundation Model for Knowledge Graphs: Geometry- and Text-Aware Pretraining for Transferable Downstream Tasks

June 10th, 2026 | by

Knowledge graphs like Wikidata combine rich relational structure with natural-language descriptions, yet most models are trained narrowly for a single task and transfer poorly. This thesis investigates how a single generative graph foundation model, pretrained on large-scale text-rich knowledge graphs, can be adapted to a range of downstream tasks, including knowledge graph completion, text-conditional subgraph generation, and graph anomaly detection, with minimal task-specific supervision. The work integrates geometry-aware representation learning, text-conditioned graph transformers, and generative graph modelling into one transferable pretraining-and-adaptation pipeline.

Ontology-Based Data Augmentation with LLMs for Narrative Classification

May 12th, 2026 | by

Narrative Classification identifies stories via NLP but often lacks generalizability. While LLMs augment other text tasks, their narrative application remains exploratory. This thesis investigates whether an ontology-based LLM-agent framework incorporating specific data characteristics improves synthetic training data quality.

Traceability Framework for Human–LLM-Assisted Tabular Data Transformations

April 29th, 2026 | by

Large Language Models (LLMs) are increasingly used to support data wrangling, but their integration into interactive transformation workflows raises new challenges for auditability, reproducibility, and accountability. When users approve, reject, or refine LLM-generated suggestions, conventional data lineage systems often fail to capture why a change occurred, who was responsible for it, and which transformation produced the final dataset.

This thesis investigates a compact traceability framework for human–LLM-assisted transformations of uploaded tabular files. The target setting is a single structured tabular data file (e.g. CSV), column-level transformation workflows, and practical reproducibility. The framework tracks file versions, table versions, selected columns, LLM suggestions, human decisions, approved transformation specifications, generated code references, execution events, and resulting output versions, with the goal of enabling reconstruction and rollback without storing the full conversation.

This paper is co-supervised by Prof. Jiannan Wang (jnwang@tsinghua.edu.cn) from Department of Computer Science and Technology at Tsinghua University , who also serves as the second supervisor.

Training a Tiny LLM with Block Attention Residuals on CommonsenseQA

April 23rd, 2026 | by

Knowledge-augmented multiple-choice question answering (MCQA) aims to improve robustness and factual grounding by integrating external structured knowledge (e.g., knowledge graphs) into language-model-based decision making. Current high-performing systems typically retrieve a local subgraph relevant to a question and candidate answers, then combine pretrained language representations with explicit graph reasoning modules.

This thesis investigates an alternative representation path: instead of processing retrieved knowledge graph (KG) subgraphs as symbolic triples with graph neural networks, the subgraphs are deterministically rendered into a compact 2D “visual graph” representation and encoded with a vision backbone. The resulting visual KG evidence is fused with a encoder-only language model via attention-based cross-modal interaction. The core research question is whether a visually encoded KG can preserve decision-relevant relational structure and support competitive knowledge-augmented MCQA performance on CommonsenseQA and OpenBookQA (optionally extending to MedQA-USMLE).

Master’s Thesis: Automatic Generation of Ontological Representations for Co-Simulation Configurations in Power Engineering

Abstract
This master’s thesis aims at examining the applicability of automatic ontology generation and ontology-based data integration to the configuration of co-simulation scenarios. To study power systems through simulations, it is conducive to model sub-domains through separate simulators, which are combined through co-simulations to comprise complex simulation scenarios. However, what is gained through focused modelling of subdomains is paid in complexity, when configuring scenarios comprising of many and heterogenous simulators. Some use cases also necessitate the integration of co-simulations with external systems such as data bases or user-interfaces, which needs to be reflected in the configuration of the simulations. Mechanisms for validation and rule-based configuration are necessary. The student will contribute to a framework for ontology-based configuration of co-simulation scenarios by exploring the possibility of automatically generating ontologies for simulators and integrating these with another and with external ontologies for integration with external systems.

Ontology-Grounded Extraction of Research SoftwareMentions from Scientific Publications

March 27th, 2026 | by

Research software is among the least discoverable scholarly outputs. While standards like CodeMeta and CFF enable structured software metadata at the repository level, they require active curation by maintainers and see inconsistent adoption. On the publication side, only select publishers such as Schloss Dagstuhl’s DROPS platform provide citable software artifacts, again contingent on explicit author action. As a result, most research software is mentioned only in unstructured publication text, invisible to metadata-driven search and agentic systems, and non-compliant with FAIR principles. This problem is compounded in applied domains like visualization research, where publications frequently describe custom tools and prototypes that are never publicly released, a case not covered by current metadata schemas. Automating ontology-grounded extraction of software mentions from publications would close this gap and enrich the metadata foundation for downstream services such as the research copilots developed within NFDI4DS.

Incremental Knowledge Graph Ingestion with Change Detection and Provenance Tracking

March 26th, 2026 | by

Keeping a knowledge graph up to date as its source data evolves is harder than building one from scratch. New records appear, existing records are corrected, and metadata is enriched over time. Each type of change a corrected DOI, an added co-author, a retracted publication carries different semantic implications and may require a different update strategy. Detecting these changes efficiently and propagating them without introducing duplicates, losing provenance, or overwriting valid data remains an open challenge, particularly when the goal is to avoid heavyweight versioning infrastructure.