Supervised fine-tuning LLMs for cybersecurity playbook translation

April 5th, 2024

Thesis Type
  • Master
Rabina Borici
Proposal on
03/05/2024 11:15 am
Proposal room
Seminar room I5 6202
Stefan Decker
Mehdi Akbari G.

The master’s thesis project aims to automate the translation of unstructured or semi-structured cybersecurity playbooks into a standardized, machine-readable format (OASIS CACAO) using Large Language Models (LLMs). It highlights the benefits of structured processes, interoperable visualizations, collaborative knowledge sharing, and automated response actions, reducing human intervention. A key research focus is ensuring the accuracy, reliability, and effectiveness of LLM-generated workflows during playbook translation. It includes a concept where security operators use LLMs to convert unstructured text into structured workflows, with syntax checkers and playbook management components ensuring standard compliance and content accuracy. The main research problem of the master’s thesis is scarcity of data in both unstructured and machine-readable formats. It will focus on  fine-tuning of LLMs (e.g., GPT-3, 4 or Llama 2) with small quantity of data (with the help of other strategies such as feature extraction) for the CACAO playbook translation and further development of an already existing CACAO syntax checker component for syntax verification.

These are the resources we talked about for initial thinking of the topic:


Basic knowledge in the domains of cyber security, Natural Language Processing (Specifically, Generative AI).