Categories
Pages
-

DBIS

Knowledge Graph Construction from Biomedical Literature using Large Language Models

April 17th, 2024

Fine-tuning pre-trained large language models (LLMs) enhances biomedical text mining. This thesis introduces a tool capable of performing tasks such as Named Entity Recognition (NER), Normalization (NEN), and Knowledge Graph Construction (KGC). A key research question explores how LLMs can address the challenges of named entity recognition, normalization, and relation extraction in biomedical contexts.

Thesis Type
  • Master
Student
Hanbin Chen
Status
Running
Presentation room
Seminar room I5 6202
Supervisor(s)
Stefan Decker
Advisor(s)
Yongli Mou
Contact
mou@dbis.rwth-aachen.de

Biomedical text mining involves extracting significant information from extensive medical literature. It incorporates Named Entity Recognition (NER) to identify biomedical entities, Named Entity Normalization (NEN) to map these entities to standard terminologies, and Relation Extraction(RE) to elucidate the associations among entities. The thesis project aims to develop a tool that enables user-customized biomedical text mining tasks including NER, NEN, RE, and KGC. A key research focus is the challenge of discontinuous NER, addressing the complexities of identifying and normalizing fragmented biomedical entities across diverse datasets.

Objectives

  • Explore the potential of LLMs for named entity recognition (NER)
  • Explore the potential of LLMs for named entity normalization (NEN)
  • Explore the potential of LLMs for relation extraction between entities (RE)

Tasks

  • Fine-tuning pre-trained LLMs for NER, NEN, RE
  • Evaluating the performance of fine-tuned LLMs
  • Developing a configurable tool to meet user-specific requirements in LLM applications

Prerequisites:
  • Basic skills in Python
  • Basic knowledge of machine learning and NLP concepts