Knowledge Graph-Enhanced Vision-Language Models for Radiology Report Generation

February 13th, 2025

Radiology report generation is a critical task in medical imaging analysis, where accurate and comprehensive descriptions of medical scans (such as X-ray, CT, or MRI) are required for diagnosis and treatment planning. Vision-language models (VLMs) have recently gained attention for automating this process by generating textual reports from medical images. However, standard VLMs often suffer from factual inconsistencies, limited domain knowledge, and difficulties in handling complex medical terminology. Knowledge graphs (KGs) provide structured domain-specific information, offering an opportunity to enhance VLMs with prior medical knowledge. Integrating knowledge graphs into vision-language models can improve the accuracy and interpretability of generated radiology reports by ensuring consistency with known medical facts and terminology. This thesis investigates how knowledge graph-enhanced VLMs can improve the quality, factual correctness, and clinical relevance of automated radiology report generation.

Thesis Type	Master
Student	Antonia Gustke
Status	Running
Presentation room	Seminar room I5 6202
Supervisor(s)	Stefan Decker
Advisor(s)	Yongli Mou
Contact	mou@dbis.rwth-aachen.de

Objectives

Analyze the limitations of current vision-language models for radiology report generation.
Investigate how knowledge graphs can be integrated into VLMs to enhance medical image understanding and text generation.
Develop and implement a knowledge graph-enhanced VLM tailored for radiology report generation.
Evaluate the model’s performance on medical imaging datasets using clinical metrics such as factual correctness, coherence, and clinical relevance.

Tasks

Literature Review
- Study existing research on vision-language models in medical imaging.
- Explore knowledge graph-based techniques in natural language generation and medical AI.
Data Analysis and Knowledge Graph Construction
- Identify structured medical knowledge sources (e.g., RadGraph, UMLS, Wikidata).
- Construct a domain-specific knowledge graph to support the model.
Model Development
- Design a framework integrating a knowledge graph with a vision-language model.
- Implement techniques such as graph embeddings, retrieval-augmented generation (RAG), or knowledge injection mechanisms.
Evaluation and Optimization
- Test the model on real-world or publicly available radiology datasets (e.g., MIMIC-CXR).
- Compare results with standard VLMs using clinical accuracy metrics and NLP evaluation benchmarks.

Prerequisites:

Technical Skills

Machine Learning & Deep Learning
- Understanding of neural networks, transformers, and multimodal learning.
- Familiarity with Vision-Language Models (VLMs) such as BLIP, GIT, or LLaVA.
- Experience with Natural Language Processing (NLP) techniques, including text generation and sequence modeling.
Graph Neural Networks and Knowledge Graphs
- Basics of Graph Neural Networks (GNNs) and their applications.
- Experience with knowledge graphs, ontology-based reasoning, or embedding techniques (e.g., Node2Vec, TransE).
Medical Imaging and Radiology Reports
- Basic understanding of medical imaging (X-rays, MRIs, CT scans).
- Familiarity with common radiology datasets (e.g., MIMIC-CXR).

Programming Tools

Python and deep learning frameworks (e.g., PyTorch).
Knowledge of Hugging Face Transformers and multimodal learning libraries.
Experience with graph libraries such as NetworkX, DGL, or PyTorch Geometric.
Familiarity with medical NLP libraries (e.g., RadGraph).
Ability to work with large-scale datasets, including data preprocessing and augmentation.

Mathematical and Analytical Skills

Knowledge of probability and statistics for model evaluation.
Understanding of loss functions and evaluation metrics for text and image tasks.

Soft Skills

Ability to conduct a systematic literature review.
Critical thinking for analyzing model performance and improving techniques.
Strong communication skills for writing reports and presenting research findings.

DBIS