Analysing Scientific Publications using AI-based Large Language Models (LLMs)

July 11th, 2023

Analysing Scientific Publications Using AI

Thesis Type
  • Master
Philipp Hertweck
Proposal on
27/02/2024 10:30 am
Proposal room
Seminar room I5 6202
Presentation room
Seminar room I5 6202
Stefan Decker
Sulayman K. Sowe
Yongli Mou
Alexander Neumann

Scientific publications analysis, a special kind of Document Analysis (DA), is the quantitative and qualitative analysis of the content of a publication with the sole purpose of making more sense of the written content, generating more insights beyond the abstract and making the publication document more understandable to readers.

The main goal of the thesis project is to develop the expertise and tools needed to extract, summarise and analyse scientific publications written in LaTeX. 

The proposed thesis project research methodology (Figure below) shows that the candidate will review LaTeX source scientific publications and apply his/her programming skills to segment the documents into manageable chunks that the LLM of an AI tool (e.g., ChatGPT, LangChain) can understand. Using a suitable LLM, she/he will generate, analyse and compile the summaries into a ‘’new’’ article for various audiences. 

The candidate is welcome to join and get support from the DBIS/Fraunhofer FIT AI(LLM) working group to help you understand the latest LLMs R&D trends. In collaboration with your advisor, you will review and verify the outputs of the AI summaries. You are encouraged to publish your findings, tools, code, and the systems you used. Your advisor and supervisor will support you in documenting the lessons you have learnt, the research challenges you encountered, and the future research directions you plan to undertake.

Opportunities and benefits:

  1. Get support in learning practical skills to prepare you for the ‘’world of work’’.
  2. Learn to write and co-publish scientific papers with expert senior researchers and professors.
  3. Opportunities to continue your work as a student worker and to travel to present your research at international conferences and workshops.
  4. Opportunity and support to take your research to the next level (PhD).


Opportunities beyond your RWTH Thesis project:

In a world dominated by AI, the demand for computer scientists and software engineers with AI-based document analyst knowledge and expertise is limitless. For example, big companies like IBM, SAP,, Deutsche Bank, and Google’s Document AI Solutions use AI to analyse various documents and workflows.

Recommended reading list:


Interested, eager to start and have fun?

Contact the thesis advisor: Dr. Sulayman K Sowe (

For more information, please visit: Information about Thesis Process

Flyer_AI-Based Documents Analysis using Large Language Models (LLMs)

  1. Programming in Python or a suitable programming language.
  2. Experience in using and managing GitHub repositories.
  3. Knowledge of APIs (Application Programming Interfaces).
  4. How to use and write documents in LaTeX.
  5. Text mining, Natural Language Processing, and Neural networks.
  6. Good reading and writing skills in German and English.
  7. Ability to quickly adapt to working in a large multiculturally academic environment.