Sensor data is often unstructured and while available datasets show some clear use cases, for example calculating energy consumption over time, relationships between measurements can often go unnoticed without a thorough examination of the data. While exploratory data analysis can reveal connections, without a clear analytical direction the results may be limited to general information, such as clustering or embeddings. In many cases, stakeholders or key decision makers may however lack knowledge to go beyond such analysis. Using Graph Retrieval-Augmented Generation (Graph-RAG), LLMs can infer connections between entities within a given knowledge graph, potentially providing more accurate and meaningful outputs. In general, data based on Ontologies can be represented as such graphs and has already been used to enhance LLM Agents with domain-specific knowledge. Therefore, if an LLM agent would be able to infer and explain important characteristics of given data with the help of a data-focused IoT ontology and convey them to a stakeholder, one could directly go on to more expedient data analysis.
Thesis Type |
|
Student |
Benedikt Ricken |
Status |
Running |
Presentation room |
Seminar room I5 6202 |
Supervisor(s) |
Stefan Decker |
Advisor(s) |
Maximilian Kißgen |
Contact |
kissgen@dbis.rwth-aachen.de |
The subject of this thesis is to develop or adapt an ontology for sensor data and an LLM agent that uses this ontology to explain characteristics of given sensor datasets
Goals & Objectives:
- Conduct a literature review for existing IoT ontologies and their adaptability for inference of data characteristics
- Develop or adapt an ontology for IoT sensor data and integrate it with an LLM agent that uses it to infer characteristics from given datasets. The LLM Agent should be tailored to non-data scientist stakeholders
- Evaluate the resulting Agent on users and IoT sensor datasets by comparing it to pre-existing human inference and LLM Agents without ontology context
Challenges:
While getting an LLM to infer characteristics overall is relatively easy, to have it actually identify useful connections is not trivial, especially the more domain-specific it gets. An crucial step is to clearly define what kind of information would be valuable to a user
Related Literature:
- https://arxiv.org/abs/1707.00112
- https://arxiv.org/abs/2412.15235
- https://arxiv.org/pdf/2306.11025
- https://aclanthology.org/2023.emnlp-demo.31.pdf
- https://link.springer.com/chapter/10.1007/978-3-031-42941-5_16
- Basic knowledge about Machine Learning and LLM concepts
- Basic knowledge about IoT concepts
- Experience with Python or related programming languages
- Nice to Have: Knowledge about LLM finetuning/RAG