Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation [5,7]. However, they face significant limitations when working with specific software environments, particularly [3,5]: To address these challenges, Retrieval Augmented Generation (RAG) approaches have emerged, enabling LLMs to access relevant contextual information [6]. Among these, knowledge graph-based representations offer a novel and promising approach, providing structured semantic relationships that traditional RAG methods may miss [1,4]. This thesis explores the application of LLM-knowledge graph integration for domain-specific code generation, using time series analysis as a concrete use case [1].
Thesis Type |
|
Student |
Anja Wagner |
Status |
Running |
Presentation room |
Seminar room I5 - 6202 |
Supervisor(s) |
Stefan Decker |
Advisor(s) |
Yixin Peng Christopher Pack |
Contact |
peng@dbis.rwth-aachen.de christopher.ingo.pack@fit.fraunhofer.de |
Use Case Scenario
The LLM system should generate executable scripts for time series analysis tasks using established Python libraries (TSLib, SKTime, etc.) based on existing user code.
- Key Requirements
- Task 1: Generate code for forecasting based on user specifications
- Task 2: Utilize existing time series libraries and their specific APIs
- Task 3: Integrate existing user code (to gather information such as where the time series dataset can be loaded from)
- Task 4: Enable interactive refinement through clarifying questions
- Example Workflow
- User request: “Create a forecast for time series X using model Y”
- LLM asks clarifying questions (horizon, seasonality, etc.)
- LLM queries knowledge graph for relevant libraries, functions, and parameters [1,6]
- LLM generates script incorporating library-specific code
- User reviews and optionally refines
Objectives
- Investigate different approaches for LLM + Knowledge Graph solutions [2,6]:
- What are the key factors influencing the quality and accuracy of LLM code generation?
- How do knowledge graph-based approaches compare to other methods for context-aware code generation?
- What specific advantages do knowledge graphs provide in representing code dependencies and domain knowledge for LLM-assisted code generation?
- How can LLMs effectively leverage knowledge graphs to generate code that integrates with existing libraries and frameworks in the time series analysis domain?
- Design and develop a proof of concept LLM + Knowledge Graph solution for Code Generation in the Use Case [1,6]
- Evaluate your solution against other approaches in the Use Case [5]
Tasks:
- Literature Review and Analysis
- Conduct a systematic literature review of LLM-based code generation approaches [2,5,7]
- When applicable, focus on code generation for time series analysis
- Identify and categorize approaches:
- Context-aware code generation methods
- Knowledge graph applications in software engineering [1,4,6]
- graph-based vs. other solutions
- Analyze advantages and limitations of each approach
- Identify success factors for knowledge graph integration [1,6]
- Determine whether its possible to create knowledge graphs out of code libraries automatically
- Conduct a systematic literature review of LLM-based code generation approaches [2,5,7]
- PoC Design and Implementation
- Knowledge Graph Construction: [1]
- Model time series libraries, functions, parameters, and dependencies
- Represent relationships between components (e.g., function inputs/outputs, library hierarchies)
- If possible, integrate an automatic KG construction approach
- LLM Integration: [6]
- Implement retrieval mechanisms from knowledge graph
- Design prompt engineering strategies incorporating graph information
- Utilize frameworks: LangChain/LangGraph, Model Context Protocol (MCP)
- Pipeline Development:
- User intent parsing
- Interactive clarification module
- Graph query and retrieval
- Code generation and validation
- Knowledge Graph Construction: [1]
- Evaluation
- Comparison against Baselines and SOTA:
- Vanilla LLM (no context)
- State-of-the-art solutions from the literature search
- Evaluation Metrics: [3,5]
- Functional correctness: Syntax validity, execution success rate
- Code quality: Library usage correctness, API compliance
- Domain performance: For generated forecasts/anomaly detection—MAE, RMSE, F1-score
- Efficiency: Response time, token usage
- Comparison against Baselines and SOTA:
- Analysis and Documentation
- Comparative analysis of results
- Identification of strengths and weaknesses
- Discussion of limitations and future work
Scope and Limitations
- In Scope
- Code generation for time series forecasting
- Integration with Python-based time series libraries
- Knowledge graph representation of libraries and dependencies [1]
- Comparative evaluation against baselines & SOTA [5]
- Out of Scope
- Production-ready deployment
- Support for languages other than Python
- Real-time code execution environments
- Extensive user studies with multiple participants
- Automatic code debugging and repair
- Constraints
- The developed PoC need not outperform all existing solutions; the focus is on demonstrating the viability and understanding the trade-offs of the knowledge graph approach
References:
- Graß, A., Beecks, C., Chala, S.A., Lange, C., Decker, S.J.: A Knowledge Graph for Query-Induced Analyses of Hierarchically Structured Time Series Information. In: Abelló, A., Vassiliadis, P., Romero, O., Wrembel, R., Bugiotti, F., Gamper, J., Vargas Solar, G., Zumpano, E. (eds.) New Trends in Database and Information Systems: ADBIS 2023 Short Papers, Doctoral Consortium and Workshops, pp. 174–184. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-42941-5_16
- Dehaerne, E., Dey, B., Halder, S., De Gendt, S., Meert, W.: Code Generation Using Machine Learning: A Systematic Review. IEEE Access 10, 82434–82455 (2022). https://doi.org/10.1109/ACCESS.2022.3196347
- Liu, F., Liu, Y., Shi, L., Huang, H., Wang, R., Yang, Z., Zhang, L., Li, Z., Ma, Y.: Exploring and Evaluating Hallucinations in LLM-Powered Code Generation. arXiv:2404.00971 (2024). https://doi.org/10.48550/arXiv.2404.00971
- Patir, R., Guo, K., Cai, H., Hu, H.: Fortifying LLM-Based Code Generation with Graph-Based Reasoning on Secure Coding Practices. arXiv:2510.09682 (2025). https://doi.org/10.48550/arXiv.2510.09682
- Chen, M., Tworek, J., Jun, H., et al.: Evaluating Large Language Models Trained on Code. arXiv:2107.03374 (2021). https://doi.org/10.48550/arXiv.2107.03374
- Procko, T.T., Ochoa, O.: Graph Retrieval-Augmented Generation for Large Language Models: A Survey. In: 2024 Conference on AI, Science, Engineering, and Technology (AIxSET), pp. 166–169. IEEE (2024). https://doi.org/10.1109/AIxSET62544.2024.00030
- Austin, J., Odena, A., Nye, M.I., Bosma, M., Michalewski, H., Dohan, D., Jiang, E., Cai, C.J., Terry, M., Le, Q.V., Sutton, C.: Program Synthesis with Large Language Models. arXiv:2108.07732 (2021). https://doi.org/10.48550/arXiv.2108.07732
- Solid Python programming skills (including dependency handling and reproducible environments).
- Basic understanding of machine learning and LLM prompting / RAG-style systems.
- Familiarity with knowledge graphs (entities/relations, schema design, querying) is beneficial.
- Basic software engineering practices: testing, logging, benchmarking, and experiment documentation.