Using LLMs with Knowledge Graphs to Enhance Code Generation

February 20th, 2026

Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation [5,7]. However, they face significant limitations when working with specific software environments, particularly [3,5]:

Lack of access to existing codebases

Limited knowledge of project-specific packages, dependencies, and interfaces

Difficulty maintaining consistency with established code patterns and architectures

To address these challenges, Retrieval Augmented Generation (RAG) approaches have emerged, enabling LLMs to access relevant contextual information [6]. Among these, knowledge graph-based representations offer a novel and promising approach, providing structured semantic relationships that traditional RAG methods may miss [1,4].

This thesis explores the application of LLM-knowledge graph integration for domain-specific code generation, using time series analysis as a concrete use case [1].

Thesis Type	Bachelor
Student	Anja Wagner
Status	Running
Presentation room	Seminar room I5 - 6202
Supervisor(s)	Stefan Decker
Advisor(s)	Yixin Peng Christopher Pack
Contact	peng@dbis.rwth-aachen.de christopher.ingo.pack@fit.fraunhofer.de

Use Case Scenario

The LLM system should generate executable scripts for time series analysis tasks using established Python libraries (TSLib, SKTime, etc.) based on existing user code.

Key Requirements
- Task 1: Generate code for forecasting based on user specifications
- Task 2: Utilize existing time series libraries and their specific APIs
- Task 3: Integrate existing user code (to gather information such as where the time series dataset can be loaded from)
- Task 4: Enable interactive refinement through clarifying questions
Example Workflow
- User request: “Create a forecast for time series X using model Y”
- LLM asks clarifying questions (horizon, seasonality, etc.)
- LLM queries knowledge graph for relevant libraries, functions, and parameters [1,6]
- LLM generates script incorporating library-specific code
- User reviews and optionally refines

Objectives

Investigate different approaches for LLM + Knowledge Graph solutions [2,6]:
- What are the key factors influencing the quality and accuracy of LLM code generation?
- How do knowledge graph-based approaches compare to other methods for context-aware code generation?
- What specific advantages do knowledge graphs provide in representing code dependencies and domain knowledge for LLM-assisted code generation?
- How can LLMs effectively leverage knowledge graphs to generate code that integrates with existing libraries and frameworks in the time series analysis domain?
Design and develop a proof of concept LLM + Knowledge Graph solution for Code Generation in the Use Case [1,6]
Evaluate your solution against other approaches in the Use Case [5]

Tasks:

Literature Review and Analysis
- Conduct a systematic literature review of LLM-based code generation approaches [2,5,7]
  - When applicable, focus on code generation for time series analysis
- Identify and categorize approaches:
  - Context-aware code generation methods
  - Knowledge graph applications in software engineering [1,4,6]
  - graph-based vs. other solutions
- Analyze advantages and limitations of each approach
- Identify success factors for knowledge graph integration [1,6]
  - Determine whether its possible to create knowledge graphs out of code libraries automatically
PoC Design and Implementation
- Knowledge Graph Construction: [1]
  - Model time series libraries, functions, parameters, and dependencies
  - Represent relationships between components (e.g., function inputs/outputs, library hierarchies)
  - If possible, integrate an automatic KG construction approach
- LLM Integration: [6]
  - Implement retrieval mechanisms from knowledge graph
  - Design prompt engineering strategies incorporating graph information
  - Utilize frameworks: LangChain/LangGraph, Model Context Protocol (MCP)
- Pipeline Development:
  - User intent parsing
  - Interactive clarification module
  - Graph query and retrieval
  - Code generation and validation
Evaluation
- Comparison against Baselines and SOTA:
  - Vanilla LLM (no context)
  - State-of-the-art solutions from the literature search
- Evaluation Metrics: [3,5]
  - Functional correctness: Syntax validity, execution success rate
  - Code quality: Library usage correctness, API compliance
  - Domain performance: For generated forecasts/anomaly detection—MAE, RMSE, F1-score
  - Efficiency: Response time, token usage
Analysis and Documentation
- Comparative analysis of results
- Identification of strengths and weaknesses
- Discussion of limitations and future work

Scope and Limitations

In Scope
- Code generation for time series forecasting
- Integration with Python-based time series libraries
- Knowledge graph representation of libraries and dependencies [1]
- Comparative evaluation against baselines & SOTA [5]
Out of Scope
- Production-ready deployment
- Support for languages other than Python
- Real-time code execution environments
- Extensive user studies with multiple participants
- Automatic code debugging and repair
Constraints
- The developed PoC need not outperform all existing solutions; the focus is on demonstrating the viability and understanding the trade-offs of the knowledge graph approach

References:

Graß, A., Beecks, C., Chala, S.A., Lange, C., Decker, S.J.: A Knowledge Graph for Query-Induced Analyses of Hierarchically Structured Time Series Information. In: Abelló, A., Vassiliadis, P., Romero, O., Wrembel, R., Bugiotti, F., Gamper, J., Vargas Solar, G., Zumpano, E. (eds.) New Trends in Database and Information Systems: ADBIS 2023 Short Papers, Doctoral Consortium and Workshops, pp. 174–184. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-42941-5_16
Dehaerne, E., Dey, B., Halder, S., De Gendt, S., Meert, W.: Code Generation Using Machine Learning: A Systematic Review. IEEE Access 10, 82434–82455 (2022). https://doi.org/10.1109/ACCESS.2022.3196347
Liu, F., Liu, Y., Shi, L., Huang, H., Wang, R., Yang, Z., Zhang, L., Li, Z., Ma, Y.: Exploring and Evaluating Hallucinations in LLM-Powered Code Generation. arXiv:2404.00971 (2024). https://doi.org/10.48550/arXiv.2404.00971
Patir, R., Guo, K., Cai, H., Hu, H.: Fortifying LLM-Based Code Generation with Graph-Based Reasoning on Secure Coding Practices. arXiv:2510.09682 (2025). https://doi.org/10.48550/arXiv.2510.09682
Chen, M., Tworek, J., Jun, H., et al.: Evaluating Large Language Models Trained on Code. arXiv:2107.03374 (2021). https://doi.org/10.48550/arXiv.2107.03374
Procko, T.T., Ochoa, O.: Graph Retrieval-Augmented Generation for Large Language Models: A Survey. In: 2024 Conference on AI, Science, Engineering, and Technology (AIxSET), pp. 166–169. IEEE (2024). https://doi.org/10.1109/AIxSET62544.2024.00030
Austin, J., Odena, A., Nye, M.I., Bosma, M., Michalewski, H., Dohan, D., Jiang, E., Cai, C.J., Terry, M., Le, Q.V., Sutton, C.: Program Synthesis with Large Language Models. arXiv:2108.07732 (2021). https://doi.org/10.48550/arXiv.2108.07732

Prerequisites:

Solid Python programming skills (including dependency handling and reproducible environments).
Basic understanding of machine learning and LLM prompting / RAG-style systems.
Familiarity with knowledge graphs (entities/relations, schema design, querying) is beneficial.
Basic software engineering practices: testing, logging, benchmarking, and experiment documentation.

DBIS