Categories
Pages
-

DBIS

Analysing Policy Documents using AI-based Large Language Models (LLMs)

July 18th, 2023

Analysing Policy Documents Using AI

Thesis Type
  • Master
Student
Tobias Kiel
Status
Running
Proposal on
14/02/2024 10:30 am
Proposal room
Seminar room I5 6202
Presentation room
Seminar room I5 6202
Supervisor(s)
Stefan Decker
Advisor(s)
Sulayman K. Sowe
Yongli Mou
Alexander Neumann
Contact
sowe@dbis.rwth-aachen.de
mou@dbis.rwth-aachen.de
neumann@dbis.rwth-aachen.de

Policy documents analysis, a special kind of Document analysis (DA), can be defined as the quantitative and qualitative analysis of the content of a policy document with the sole purpose of making more sense of the written content, generating more insights beyond the executive summary (if present), and making the policy more understandable to a broader audience.

The main goal of the thesis project is to develop the expertise and tools needed to summarise policy documents (in pdf or doc format) for various stakeholders. 

The proposed thesis project research methodology (Figure below) shows that the candidate will review published two or three European Commission or German Government policy documents as a case study. The candidate will apply his/her programming skills to divide the policy document into manageable chunks that the LLM of an AI tool (e.g., ChatGPT, LangChain, etc.) can understand. She/he will use a suitable LLM to generate summaries for various stakeholders – with different expertise and interest in the summarised policy.

The candidate is welcome to join and get support from the DBIS/Fraunhofer FIT AI(LLM) working group to help you understand the latest LLMs R&D trends. In collaboration with your advisor, you will review and verify the outputs of the AI summaries. You are encouraged to publish your findings, tools, code, and the systems you used. Your advisor and supervisor will support you in documenting the lessons you have learnt, the research challenges you encountered, and the future research directions you plan to undertake.

Opportunities and benefits:

  1. Get support in learning practical skills to prepare you for the ‘’world of work’’.
  2. Learn to write and co-publish scientific papers with expert senior researchers and professors.
  3. Opportunities to continue your work as a student worker and travel to present your research at international conferences and workshops.
  4. Opportunity and support to take your research to the next level (PhD).

Opportunities beyond your RWTH Thesis project:

In a world dominated by AI, the demand for computer scientists and software engineers with AI-based document analyst knowledge and expertise is limitless. For example, big companies like IBM, SAP, Dexpro.de, Deutsche Bank, and Google’s Document AI Solutions use AI to analyse various documents and workflows.

Recommended reading list:

  1. Tamir Hassan (2009). Object-level document analysis of PDF files. In Proceedings of the 9th ACM symposium on Document Engineering (DocEng ’09), ACM, pp:47–55. https://doi.org/10.1145/1600193.1600206
  2. Douzon, T., Duffner, S., Garcia, C., Espinas, J. (2022). Improving Information Extraction on Business Documents with Specific Pre-training Tasks. In: Uchida, S., Barney, E., Eglin, V. (eds) Document Analysis Systems. DAS 2022. Lecture Notes in Computer Science, vol 13237. Springer, Cham. https://doi.org/10.1007/978-3-031-06555-2_8
  3. Bowen, G.A. (2009). Document Analysis as a Qualitative Research Method, Qualitative Research Journal, Vol. 9 No. 2, pp. 27-40. https://doi.org/10.3316/QRJ0902027
  4. Morgan, H. (2022). Conducting a Qualitative Document Analysis. The Qualitative Report, 27(1), 64-77. https://doi.org/10.46743/2160-3715/2022.5044
  5. Niful Islam, et al. (2023). Distinguishing Human Generated Text From ChatGPT Generated Text Using Machine Learning. arXiv:2306.01761v1, https://doi.org/10.48550/arXiv.2306.01761.
  6. Sébastien Bubeck, et al. (2023). Sparks of Artificial General Intelligence: Early experiments with GPT-4. arXiv:2303.12712v5, https://doi.org/10.48550/arXiv.2303.12712
  7. GeneRation Of BIbliographic Data: https://grobid.readthedocs.io/en/latest/Introduction/
  8. Grobid Bibliographical Extraction: http://grobid.wikidata.dbis.rwth-aachen.de/
  9. Pankaj Tripathi (2023). The Future Of AI-Powered Document Processing. Available at https://www.docsumo.com/blog/document-ai-future, accessed Friday, 07 July 2023.

Interested, eager to start and have fun?

Contact the thesis advisor: Dr. Sulayman K Sowe (sowe@dbis.rwth-aachen.de)

For more information, please visit: Information about Thesis Process

Flyer_AI-Based Policy Documents Analysis using Large Language Models (LLMs)


Prerequisites:

Skills you need or are willing to learn to succeed:

  1. Programming in Python or a suitable programming language.
  2. Experience in using and managing GitHub repositories.
  3. Knowledge of APIs (Application Programming Interfaces).
  4. Text mining, Natural Language Processing, and Neural networks.
  5. Good reading and writing skills in German and English.
  6. Ability to quickly adapt to working in a large multiculturally academic environment.