Automatic generation of standard incident response playbooks from attack models

April 22nd, 2024

Thesis Type	Master
Student	Pedram Ahmadiyeh
Status	Finished
Proposal on	07/06/2024 11:15 am
Proposal room	Seminar room I5 6202
Presentation on	24/06/2025 1:00 pm
Supervisor(s)	Stefan Decker
Advisor(s)	Mehdi Akbari G. osen
Contact	mehdi.akbari.gurabi@fit.fraunhofer.de oemer.sen@fit.fraunhofer.de

The primary aim of this thesis is to explore the data infrastructure necessary for training large language models (LLMs) specifically for incident response playbooks in the field of cybersecurity, with an emphasis on adhering to the CACAO format at a later production stage. This research addresses the significant challenge of limited data availability for incident response playbooks, which is crucial for training LLMs. A central question is whether synthetically generated data can effectively train LLMs to produce high-quality incident response playbooks.

The thesis will include an extensive review and analysis of existing research on data generation for deep learning applications, particularly focusing on large language models. This will involve a comparative analysis of the latest research and state-of-the-art methodologies. The thesis will establish a structured and systematic categorization of various methods and approaches in this field.

A comprehensive requirement analysis is essential to determine criteria for both evaluation and the selection of suitable technologies for the thesis objectives. Based on these criteria, the thesis will model the envisioned data, detailing its specifications and attributes. This model will form the basis for developing a concept and design for the data generation approach, which will then be implemented using the selected technologies.

Furthermore, the thesis will develop a detailed investigation procedure. This will include a thorough description of experimental setups, investigation environments, evaluation methods, and assumptions and conditions. This procedure will be employed to systematically assess the suitability of the developed data generation approach in addressing the scarcity of data for training LLMs.

Finally, based on the findings, the thesis will propose guidelines for the data generation approach. It will also offer a demonstrative example of the developed approach, showcasing its practical application and effectiveness in generating data for LLM training in the context of cybersecurity incident response playbooks.

These are resources for the thesis topic as an example:

Playbook Examples and guidelines: These links will provide simple practical examples of cybersecurity playbooks:

OASIS CACAO Specification: This document details the Collaborative Automated Course of Action Operations (CACAO) standard for cybersecurity playbooks: https://docs.oasis-open.org/cacao/security-playbooks/v2.0/security-playbooks-v2.0.html
Fine-tuning with OpenAI: This resource from OpenAI discusses fine-tuning, a method for effectively utilizing Large Language Models (LLMs) like GPT-3 or GPT-4. Understanding how to fine-tune models that results in better outcomes will be key in automating the playbook translation process: https://platform.openai.com/docs/guides/fine-tuning/when-to-use-fine-tuning
Some relevant articles:

Prerequisites:

Basic knowledge in the domains of cyber security, Natural Language Processing (Specifically, Generative AI).

DBIS

Automatic generation of standard incident response playbooks from attack models

Quick Links

Recent News

Recent Publications